Suchergebnisse

Domain adaptation with linked encyclopedic data: A case study for historical german

Autor*in: Hagen, Thora

Erschienen: 2025

Verlag: Aachen : CEUR Workshop Proceedings ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

This paper outlines a proposal for the use of knowledge graphs for historical German domain adaptation. From the EncycNet project, the encyclopedia-based knowledge graph from the early 20th century was borrowed to examine whether text-based domain... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/13078 https://ids-pub.bsz-bw.de/files/13078/Hagen_Domain_adaptation_2024.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-130788

This paper outlines a proposal for the use of knowledge graphs for historical German domain adaptation. From the EncycNet project, the encyclopedia-based knowledge graph from the early 20th century was borrowed to examine whether text-based domain adaptation using the source encyclopedia’s text or graph-based adaptation produces a better domain-specific model. To evaluate the approach, a novel historical test dataset based on a second encyclopedia of the early 20th century was created. This dataset is categorized by knowledge type (factual, linguistic, lexical) with special attention paid to distinguishing simple and expert knowledge. The main finding is that, surprisingly, simple knowledge has the most potential for improvement, whereas expert knowledge lags behind. In this study, broad signals like simple definitions and word origin yielded the best results, while more specialized knowledge such as synonyms were not as effectively represented. A follow-up study was carried out in favor of simple contemporary lexical knowledge to control for historicity and text genre, where the results confirm that language models can still be enhanced by incorporating simple lexical knowledge using the proposed workflow.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Germanische Sprachen; Deutsch (430)
Schlagworte:	Semantik
Lizenz:	creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess

Modeling and Measuring Short Text Similarities. On the Multi-Dimensional Differences between German Poetry of Realism and Modernism

Autor*in: Ehrmanntraut, Anton ; Hagen, Thora ; Jannidis, Fotis ; Konle, Leonard ; Kröncke, Merten ; Winko, Simone

Erschienen: 2025

Verlag: Darmstadt : Universitäts- und Landesbibliothek Darmstadt ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

This study contributes to the ongoing discussion on how to operationalize text similarity for the purposes of computational literary studies by defining, justifying theoretically and employing a multi-dimensional text model. Additionally, we evaluate... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/13079 https://ids-pub.bsz-bw.de/files/13079/Ehrmanntraut_Hagen_Jannidis_Modeling_and_measuring_2025.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-130792 https://doi.org/10.48694/jcls.116

This study contributes to the ongoing discussion on how to operationalize text similarity for the purposes of computational literary studies by defining, justifying theoretically and employing a multi-dimensional text model. Additionally, we evaluate a set of strategies to implement this model for very short texts like poetry using a range of methods from weighted sparse vectors up to very recent neural sentence embeddings based on annotations of emotions, genre and similarity. And finally, we show the relevance of using such a complex text model by applying the best method to a research question about the development of early modernism in German poetry. While we can confirm some important hypotheses from literary studies, we are also able to differentiate or relativize others. In particular, our findings do not support the widely held thesis that the change from realism to modernism was a revolutionary 'rupture'.

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Deutsch
Medientyp:	Aufsatz aus einer Zeitschrift
Format:	Online
DDC Klassifikation:	Germanische Sprachen; Deutsch (430)
Schlagworte:	Ähnlichkeit; Lyrik; Modernismus; Realismus
Lizenz:	creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess

Type- and Token-based Word Embeddings in the Digital Humanities

Autor*in: Ehrmanntraut, Anton ; Hagen, Thora ; Konle, Leonard ; Jannidis, Fotis

Erschienen: 2025

Verlag: Aachen : CEUR Workshop Proceedings ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

In the general perception of the NLP community, the new dynamic, context-sensitive, token-based embeddings from language models like BERT have replaced the older static, type-based embeddings like word2vec or fastText, due to their better... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/13080 https://ids-pub.bsz-bw.de/files/13080/Ehrmanntraut_Hagen_Konle_Type_and_token_based_2021.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-130808

In the general perception of the NLP community, the new dynamic, context-sensitive, token-based embeddings from language models like BERT have replaced the older static, type-based embeddings like word2vec or fastText, due to their better performance. We can show that this is not the case for one area of applications for word embeddings: the abstract representation of the meaning of words in a corpus. This application is especially important for the Computational Humanities, for example in order to show the development of words or ideas. The main contribution of our papers are: 1) We offer a systematic comparison between dynamic and static embeddings in respect to word similarity. 2) We test the best method to convert token embeddings to type embeddings. 3) We contribute new evaluation datasets for word similarity in German. The main goal of our contribution is to make an evidence-based argument that research on static embeddings, which basically stopped after 2019, should be continued not only because it needs less computing power and smaller corpora, but also because for this specific set of applications their performance is on par with that of dynamic embeddings.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Germanische Sprachen; Deutsch (430)
Schlagworte:	Neurolinguistisches Programmieren; Korpus
Lizenz:	creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess

Verwendung von Wissensgraphen zur inhaltlichen Ergänzung kleinerer Textkorpora

Autor*in: Hagen, Thora

Erschienen: 2025

Verlag: Genf : Zenodo ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

Die Korpuserstellung ist einer der essenziellsten Schritte um ein Forschungsvorhaben im Bereich der Digital Humanities durchzuführen. Vor allem für speziellere Domänen (etwa bei der Analyse von Subgenres oder Dialekten) ist allerdings häufig nicht... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/13081 https://ids-pub.bsz-bw.de/files/13081/Hagen_Verwendung_von_Wissensgraphen_zur_inhaltlichen_Ergaenzung_2022.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-130816 https://doi.org/10.5281/zenodo.6328009

Die Korpuserstellung ist einer der essenziellsten Schritte um ein Forschungsvorhaben im Bereich der Digital Humanities durchzuführen. Vor allem für speziellere Domänen (etwa bei der Analyse von Subgenres oder Dialekten) ist allerdings häufig nicht genügend Material verfügbar, um Methoden aus dem NLP Bereich nachnutzen zu können, da diese Gigabytes an Text verlangen. Dieser Aufsatz zeigt wie Wissensgraphen, welche zum Beispiel aus Wörterbüchern erstellt werden können, helfen, kleinere Textkorpora aufzuwerten. In dem hier durchgeführten Experiment wird ein auf 20 Megabytes trainiertes FastText Modell mit den Informationen aus GermaNet angereichert. Das resultierende Modell weist die selbe Performanz auf wie ein einfaches FastText Modell, welches auf etwa dreimal soviel Daten trainiert wurde. Ein Beitrag zur 8. Tagung des Verbands "Digital Humanities im deutschsprachigen Raum" - DHd 2022 Kulturen des digitalen Gedächtnisses.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Deutsch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Wissensgraph; Korpus; Neurolinguistisches Programmieren; GermaNet; Digital Humanities
Lizenz:	creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess

Introducing traveling word pairs in historical semantic change: a case study of privacy words in 18th and 19th century English

Autor*in: Hagen, Thora ; Ketzan, Erik

Erschienen: 2025

Verlag: Aachen : Sun SITE Central Europe ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

In recent years, Lexical semantic change detection (LSCD) has become a central task of NLP. Because most studies in LSCD only consider the semantic change of words in isolation, in this paper, we propose a new direction for the analysis of semantic... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/13089 https://ids-pub.bsz-bw.de/files/13089/Hagen_Ketzan_Introducing_traveling_word_pairs_2023.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-130895

In recent years, Lexical semantic change detection (LSCD) has become a central task of NLP. Because most studies in LSCD only consider the semantic change of words in isolation, in this paper, we propose a new direction for the analysis of semantic shifts: traveling word pairs. First, we introduce shift correlation to find pairs of words that semantically shift together in a similar fashion. Second, we propose word relation shift to analyze how the relationship between two words has changed over time. As a test case, we investigate the word privacy (and related words identified by a pre-existing dictionary), as an example of a word that has shifted semantics historically and remains vibrantly explored as a concept in contemporary humanistic discourse. We report that the term privacy in comparison shows relatively little change initially – with correlation analysis revealing more about how key terms surrounding privacy have shifted in tandem, and explore nuanced changes through word pair analysis, suggesting a shift toward concreteness in particular.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Bedeutungswandel; Fallstudie; Englisch; Semantik; Computerlinguistik; Natürliche Sprache; Sprachwandel; Sprache; Geschichte
Lizenz:	creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess

Quantitative Analysis of Gendered Assumptions in a Nineteenth-Century Women’s Encyclopedia

Autor*in: Ketzan, Eric ; Hagen, Thora ; Jannidis, Fotis ; Witt, Andreas

Erschienen: 2025

Verlag: Tokyo : DH2022 Local Organizing Committee ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

This paper quantifies textual patterns relating to gendered assumptions in a fairly unique text, an entire “women’s encyclopedia” from 1830’s Germany, which at 10 volumes and 1,461,000 word tokens was of comparable size to contemporary general... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/13093 https://ids-pub.bsz-bw.de/files/13093/Ketzan_Hagen_Jannidis_Quantitative_analysis_of_gendered_2025.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-130938

This paper quantifies textual patterns relating to gendered assumptions in a fairly unique text, an entire “women’s encyclopedia” from 1830’s Germany, which at 10 volumes and 1,461,000 word tokens was of comparable size to contemporary general encyclopedias, but written and marketed for a female audience. We perform experiments on classifying gender of biographical entries and querying a specific textual feature, calendar dates, with context from comparison 19th-20th century encyclopedias from the EncycNet corpus.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Germanische Sprachen; Deutsch (430)
Schlagworte:	Quantitative Analyse; Textlinguistik; Geschlechterforschung; Korpus
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Tracing the shift to “objectivity” in German encyclopedias of the long nineteenth century

Autor*in: Hagen, Thora ; Konle, Leonard ; Ketzan, Erik ; Jannidis, Fotis ; Witt, Andreas

Erschienen: 2025

Verlag: Graz : Zentrum für Informationsmodellierung - Austrian Centre for Digital Humanities, University of Graz ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

This paper presents experiments on tracing the shift toward "objectivity" in encyclopedias of the long nineteenth century, as discussed by scholars, via query of surface features (personal pronoun, exclamation points, and interjections) and emotion... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/13094 https://ids-pub.bsz-bw.de/files/13094/Hagen_Konle_Tracing_the_shift_2023.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-130941 https://doi.org/10.5281/zenodo.8107633

This paper presents experiments on tracing the shift toward "objectivity" in encyclopedias of the long nineteenth century, as discussed by scholars, via query of surface features (personal pronoun, exclamation points, and interjections) and emotion analysis. We report a decline in these personal and emotive, and thus less "objective", textual characteristics.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Enzyklopädie; Deutsch; Objektivität; Digital Humanities; Korpus
Lizenz:	creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess

Mental Maps in EncycNet: Exploring Global Representation in a Historical, German Knowledge Graph

Autor*in: Hagen, Thora ; Jannidis, Fotis ; Witt, Andreas

Erschienen: 2025

Verlag: Genf : Zenodo ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

First popularized by Google in 2012 (Singhal 2012), knowledge graphs (KG) have now become a staple data representation method. KGs can be described as directed graphs, where the nodes (subject and object entities) represent any type of human... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/13095 https://ids-pub.bsz-bw.de/files/13095/Hagen_Jannidis_Witt_Mental_Maps_in_EncycNet_2024.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-130950

First popularized by Google in 2012 (Singhal 2012), knowledge graphs (KG) have now become a staple data representation method. KGs can be described as directed graphs, where the nodes (subject and object entities) represent any type of human knowledge. The edge specifies the relationship between subject and object. KGs are increasingly moving into the (digital) humanities. At their core, the humanities facilitate the preservation, dissemination, and analysis of cultural heritage knowledge. In all three aspects, this work is gradually changing towards the digital, such as digital thesauri. The network structure of KGs in particular opens up new possibilities for data aggregation and data analysis in the humanities. For one, KGs are an opportunity to interlink different fields of study through ontologies. They can also be used as additional sources for text preparation, as a new technology to query, aggregate, and analyze data in new ways, or to discover previously unseen relationships (Hawkins 2022, Zhang et al. 2021). In other words, as per Hyvönen (2020), KGs are not only good for data exploration and solving pre-set problems, they can also be employed for “finding research problems in the first place, for addressing them, and even for solving them automatically under the constraints set by the human researcher.” This abstract introduces the first openly published version of the EncycNet KG, a semantic knowledge graph built from historical German encyclopedias, as well as its potential for the digital humanities. In particular, using EncycNet and Wikidata, we analyze how the representation of countries in encyclopedias has changed from the 19th century until today.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Germanische Sprachen; Deutsch (430)
Schlagworte:	Wissensgraph; Digital Humanities
Lizenz:	creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess

Filtern nach

Aktive Filter

Kategorien:

Bereich

Quelle

Format

Beteiligt

Medientyp

Sprache

Jahr

Letzte Suchanfragen

Ergebnisse für *

Domain adaptation with linked encyclopedic data: A case study for historical german

Modeling and Measuring Short Text Similarities. On the Multi-Dimensional Differences between German Poetry of Realism and Modernism

Type- and Token-based Word Embeddings in the Digital Humanities

Verwendung von Wissensgraphen zur inhaltlichen Ergänzung kleinerer Textkorpora

Introducing traveling word pairs in historical semantic change: a case study of privacy words in 18th and 19th century English

Quantitative Analysis of Gendered Assumptions in a Nineteenth-Century Women’s Encyclopedia

Tracing the shift to “objectivity” in German encyclopedias of the long nineteenth century

Mental Maps in EncycNet: Exploring Global Representation in a Historical, German Knowledge Graph

Kontakt

Partner