Ergebnisse für *

Es wurden 3 Ergebnisse gefunden.

Zeige Ergebnisse 1 bis 3 von 3.

Sortieren

  1. Human languages trade off complexity against efficiency
    Erschienen: 2025
    Verlag:  San Francisco, CA : PLOS ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    From a cross-linguistic perspective, language models are interesting because they can be used as idealised language learners that learn to produce and process language by being trained on a corpus of linguistic input. In this paper, we train... mehr

     

    From a cross-linguistic perspective, language models are interesting because they can be used as idealised language learners that learn to produce and process language by being trained on a corpus of linguistic input. In this paper, we train different language models, from simple statistical models to advanced neural networks, on a database of 41 multilingual text collections comprising a wide variety of text types, which together include nearly 3 billion words across more than 6,500 documents in over 2,000 languages. We use the trained models to estimate entropy rates, a complexity measure derived from information theory. To compare entropy rates across both models and languages, we develop a quantitative approach that combines machine learning with semiparametric spatial filtering methods to account for both language- and document-specific characteristics, as well as phylogenetic and geographical language relationships. We first establish that entropy rate distributions are highly consistent across different language models, suggesting that the choice of model may have minimal impact on cross-linguistic investigations. On the basis of a much broader range of language models than in previous studies, we confirm results showing systematic differences in entropy rates, i.e. text complexity, across languages. These results challenge the long-held notion that all languages are equally complex. We then show that higher entropy rate tends to co-occur with shorter text length, and argue that this inverse relationship between complexity and length implies a compensatory mechanism whereby increased complexity is offset by increased efficiency. Finally, we introduce a multi-model multilevel inference approach to show that this complexity-efficiency trade-off is partly influenced by the social environment in which languages are used: languages spoken by larger communities tend to have higher entropy rates while using fewer symbols to encode messages.

     

    Export in Literaturverwaltung   RIS-Format
      BibTeX-Format
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Aufsatz aus einer Zeitschrift
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Großes Sprachmodell; Informationstheorie; Maschinelles Lernen; Kontrastive Linguistik; Computerlinguistik; Korpus; Statistik; Modell
    Lizenz:

    creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess

  2. Eine Vorstudie zur Eignung von Llama 3-8B für eine Sentimentanalyse
    Erschienen: 2025
    Verlag:  Genf : Zenodo ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    Dieser Beitrag präsentiert eine Vorstudie, in der geprüft wird, ob sich die Open Source Generative Künstliche Intelligenz Llama-3-8B Q4_0 instruction-tuned dazu eignet, eine Sentimentanalyse durchzuführen. Für die Untersuchung wird ein kleiner... mehr

     

    Dieser Beitrag präsentiert eine Vorstudie, in der geprüft wird, ob sich die Open Source Generative Künstliche Intelligenz Llama-3-8B Q4_0 instruction-tuned dazu eignet, eine Sentimentanalyse durchzuführen. Für die Untersuchung wird ein kleiner Datensatz aus Anfragen zu geschlechtergerechten Schreibung genutzt. Die Qualität der automatischen Annotationen wird gemessen, indem das Inter-Annotator-Agreement zwischen Llama 3 und drei menschlichen Annotierenden berechnet wird. Eine qualitative Analyse der Begründungen von Llama 3 für vergebene Sentimentwerte, die von denen der manuellen Annotationen abweichen, zeigt, dass die Generative Künstliche Intelligenz dazu genutzt werden kann, Annotationsrichtlinien aufzustellen oder zu verfeinern. Allerdings kann die Vorstudie nicht zeigen, dass sich Llama 3 für eine Sentimentanalyse eignet.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Deutsch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Generative KI; Open Source; Geschlechterforschung; Annotation; Computerlinguistik
    Lizenz:

    creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess

  3. Introducing traveling word pairs in historical semantic change: a case study of privacy words in 18th and 19th century English
    Erschienen: 2025
    Verlag:  Aachen : Sun SITE Central Europe ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    In recent years, Lexical semantic change detection (LSCD) has become a central task of NLP. Because most studies in LSCD only consider the semantic change of words in isolation, in this paper, we propose a new direction for the analysis of semantic... mehr

     

    In recent years, Lexical semantic change detection (LSCD) has become a central task of NLP. Because most studies in LSCD only consider the semantic change of words in isolation, in this paper, we propose a new direction for the analysis of semantic shifts: traveling word pairs. First, we introduce shift correlation to find pairs of words that semantically shift together in a similar fashion. Second, we propose word relation shift to analyze how the relationship between two words has changed over time. As a test case, we investigate the word privacy (and related words identified by a pre-existing dictionary), as an example of a word that has shifted semantics historically and remains vibrantly explored as a concept in contemporary humanistic discourse. We report that the term privacy in comparison shows relatively little change initially – with correlation analysis revealing more about how key terms surrounding privacy have shifted in tandem, and explore nuanced changes through word pair analysis, suggesting a shift toward concreteness in particular.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Bedeutungswandel; Fallstudie; Englisch; Semantik; Computerlinguistik; Natürliche Sprache; Sprachwandel; Sprache; Geschichte
    Lizenz:

    creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess