Suchergebnisse

Filtern nach

Letzte Suchanfragen

Ergebnisse für *

Es wurden 55 Ergebnisse gefunden.

Zeige Ergebnisse 51 bis 55 von 55.

Sortieren

Ungoliant: An optimized pipeline for the generation of a very large-scale multilingual web corpus

Autor*in: Abadji, Julien ; Ortiz Suárez, Pedro Javier ; Romary, Laurent ; Sagot, Benoît

Erschienen: 2021

Verlag: Mannheim : Leibniz-Institut für Deutsche Sprache

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/10468 https://ids-pub.bsz-bw.de/files/10468/Abadji_Suarez_Romary_Ungoliant_2021.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-104688 https://doi.org/10.14618/ids-pub-10468

Since the introduction of large language models in Natural Language Processing, large raw corpora have played a crucial role in Computational Linguistics. However, most of these large raw corpora are either available only for English or not available to the general public due to copyright issues. Nevertheless, there are some examples of freely available multilingual corpora for training Deep Learning NLP models, such as the OSCAR and Paracrawl corpora. However, they have quality issues, especially for low-resource languages. Moreover, recreating or updating these corpora is very complex. In this work, we try to reproduce and improve the goclassy pipeline used to create the OSCAR corpus. We propose a new pipeline that is faster, modular, parameterizable, and well documented. We use it to create a corpus similar to OSCAR but larger and based on recent data. Also, unlike OSCAR, the metadata information is at the document level. We release our pipeline under an open source license and publish the corpus under a research-only license.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Korpus; Natürliche Sprache; Automatische Sprachanalyse; Computerlinguistik; Urheberrecht; Open Source
Lizenz:	creativecommons.org/licenses/by/4.0/deed.de ; info:eu-repo/semantics/openAccess

Off-the-shelf semantic author name disambiguation for bibliographic data bases

Autor*in: Müller, Mark-Christoph ; Bannister, Adam ; Reitz, Florian

Erschienen: 2022

Verlag: Cham : Springer ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS) [Zweitveröffentlichung]

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11128 https://ids-pub.bsz-bw.de/files/11128/Mueller_Bannister_Reitz_Off_the_shelf_2019.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-111280 https://doi.org/10.1007/978-3-030-30760-8_42

The demo presents a minimalist, off-the-shelf AND tool which provides a fundamental AND operation, the comparison of two publications with ambiguous authors, as an easily accessible HTTP interface. The tool implements this operation using standard AND functionality, but puts particular emphasis on advanced methods from natural language processing (NLP) for comparing publication title semantics.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Literaturdatenbank; Datenbank; Veröffentlichung; Automatische Sprachanalyse; Semantik; Open Source
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Hamlet goes XML: CrossAnnotationLinking and Personal learning experiences

Autor*in: Birkenhake, Benjamin ; Panke, Stefanie ; Witt, Andreas

Erschienen: 2016

Verlag: Heerlen : Open University of the Netherlands

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/4536 https://ids-pub.bsz-bw.de/files/4536/Birkenhake_Panke_Witt_Hamlet_goes_XML_2005.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-45365

XML-based technologies offer powerful resources for open source applications in the field of e-learning. The paper describes a model of hypertext as interlinked structures that can be intertwined by cross-annotation linking. This infrastructure integrates multiple perspectives and allows creating a personal learning environment. We exemplify the approach in a case study: the Hamlet project. In the course of this project, several German translations of William Shakespeare’s Hamlet have been collected and annotated. Two different annotation layers are used to achieve a cross-linking reference between the various German translations. We will describe the theoretical background of cross-annotation linking and the actual technological implementation of the system. Additionally, we will use the personas method to gain insights into the potential benefit of the system as a personal learning environment.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Linguistik (410)
Schlagworte:	E-Learning; Hypertext; Shakespeare; William; Hamlet; Open Source
Lizenz:	creativecommons.org/licenses/by-nc-nd/2.5/nl/ ; info:eu-repo/semantics/openAccess

From Open Source to Open Information. Collaborative Methods in Creating XML-based Markup Languages

Autor*in: Rehm, Georg ; Lobin, Henning

Erschienen: 2018

Verlag: Washington : ICCC Press

Bibliographische Angaben
Zugang

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/7746 https://ids-pub.bsz-bw.de/files/7746/Rehm_Lobin_Open_Source_Open_Information_Mark_Up_2000.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-77460

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Germanische Sprachen; Deutsch (430)
Schlagworte:	Open Source; Auszeichnungssprache; XML (Extensible Markup Language); HTML (Hypertext Markup Language)
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Eine Vorstudie zur Eignung von Llama 3-8B für eine Sentimentanalyse

Autor*in: Tu, Ngoc Duyen Tanja

Erschienen: 2025

Verlag: Genf : Zenodo ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/13049 https://ids-pub.bsz-bw.de/files/13049/Tu_Eine_Vorstudie_zur_Eignung_von_Llama_3_8B_2025.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-130493 https://doi.org/10.5281/zenodo.14943090

Dieser Beitrag präsentiert eine Vorstudie, in der geprüft wird, ob sich die Open Source Generative Künstliche Intelligenz Llama-3-8B Q4_0 instruction-tuned dazu eignet, eine Sentimentanalyse durchzuführen. Für die Untersuchung wird ein kleiner Datensatz aus Anfragen zu geschlechtergerechten Schreibung genutzt. Die Qualität der automatischen Annotationen wird gemessen, indem das Inter-Annotator-Agreement zwischen Llama 3 und drei menschlichen Annotierenden berechnet wird. Eine qualitative Analyse der Begründungen von Llama 3 für vergebene Sentimentwerte, die von denen der manuellen Annotationen abweichen, zeigt, dass die Generative Künstliche Intelligenz dazu genutzt werden kann, Annotationsrichtlinien aufzustellen oder zu verfeinern. Allerdings kann die Vorstudie nicht zeigen, dass sich Llama 3 für eine Sentimentanalyse eignet.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Deutsch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Generative KI; Open Source; Geschlechterforschung; Annotation; Computerlinguistik
Lizenz:	creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess

Filtern nach

Aktive Filter

Kategorien:

Bereich

Quelle

Format

Beteiligt

Medientyp

Sprache

Jahr

Letzte Suchanfragen

Ergebnisse für *

Ungoliant: An optimized pipeline for the generation of a very large-scale multilingual web corpus

Off-the-shelf semantic author name disambiguation for bibliographic data bases

Hamlet goes XML: CrossAnnotationLinking and Personal learning experiences

From Open Source to Open Information. Collaborative Methods in Creating XML-based Markup Languages

Eine Vorstudie zur Eignung von Llama 3-8B für eine Sentimentanalyse

Kontakt

Partner