Modeling and Measuring Short Text Similarities. On the Multi-Dimensional Differences between German Poetry of Realism and Modernism
This study contributes to the ongoing discussion on how to operationalize text similarity for the purposes of computational literary studies by defining, justifying theoretically and employing a multi-dimensional text model. Additionally, we evaluate...
mehr
Volltext:
|
|
Zitierfähiger Link:
|
|
This study contributes to the ongoing discussion on how to operationalize text similarity for the purposes of computational literary studies by defining, justifying theoretically and employing a multi-dimensional text model. Additionally, we evaluate a set of strategies to implement this model for very short texts like poetry using a range of methods from weighted sparse vectors up to very recent neural sentence embeddings based on annotations of emotions, genre and similarity. And finally, we show the relevance of using such a complex text model by applying the best method to a research question about the development of early modernism in German poetry. While we can confirm some important hypotheses from literary studies, we are also able to differentiate or relativize others. In particular, our findings do not support the widely held thesis that the change from realism to modernism was a revolutionary 'rupture'.
|
Type- and Token-based Word Embeddings in the Digital Humanities
In the general perception of the NLP community, the new dynamic, context-sensitive, token-based embeddings from language models like BERT have replaced the older static, type-based embeddings like word2vec or fastText, due to their better...
mehr
Volltext:
|
|
Zitierfähiger Link:
|
|
In the general perception of the NLP community, the new dynamic, context-sensitive, token-based embeddings from language models like BERT have replaced the older static, type-based embeddings like word2vec or fastText, due to their better performance. We can show that this is not the case for one area of applications for word embeddings: the abstract representation of the meaning of words in a corpus. This application is especially important for the Computational Humanities, for example in order to show the development of words or ideas. The main contribution of our papers are: 1) We offer a systematic comparison between dynamic and static embeddings in respect to word similarity. 2) We test the best method to convert token embeddings to type embeddings. 3) We contribute new evaluation datasets for word similarity in German. The main goal of our contribution is to make an evidence-based argument that research on static embeddings, which basically stopped after 2019, should be continued not only because it needs less computing power and smaller corpora, but also because for this specific set of applications their performance is on par with that of dynamic embeddings.
|
Export in Literaturverwaltung |
|
Tracing the shift to “objectivity” in German encyclopedias of the long nineteenth century
This paper presents experiments on tracing the shift toward "objectivity" in encyclopedias of the long nineteenth century, as discussed by scholars, via query of surface features (personal pronoun, exclamation points, and interjections) and emotion...
mehr
Volltext:
|
|
Zitierfähiger Link:
|
|
This paper presents experiments on tracing the shift toward "objectivity" in encyclopedias of the long nineteenth century, as discussed by scholars, via query of surface features (personal pronoun, exclamation points, and interjections) and emotion analysis. We report a decline in these personal and emotive, and thus less "objective", textual characteristics.
|
Export in Literaturverwaltung |
|