Ergebnisse für *

Es wurden 13 Ergebnisse gefunden.

Zeige Ergebnisse 1 bis 13 von 13.

Sortieren

  1. “Konservenglück in Tiefkühl-Town” – Das Songkorpus als empirische Ressource interdisziplinärer Erforschung deutschsprachiger Poptexte
    Erschienen: 2019
    Verlag:  München [u.a.] : German Society for Computational Linguistics & Language Technology und Friedrich-Alexander-Universität Erlangen-Nürnberg

    Der Beitrag beschreibt ein mehrfach annotiertes Korpus deutschsprachiger Songtexte als Datenbasis für interdisziplinäre Untersuchungsszenarien. Die Ressource erlaubt empirisch begründete Analysen sprachlicher Phänomene, systemischstruktureller... mehr

     

    Der Beitrag beschreibt ein mehrfach annotiertes Korpus deutschsprachiger Songtexte als Datenbasis für interdisziplinäre Untersuchungsszenarien. Die Ressource erlaubt empirisch begründete Analysen sprachlicher Phänomene, systemischstruktureller Wechselbeziehungen und Tendenzen in den Texten moderner Popmusik. Vorgestellt werden Design und Annotationen des in thematische und autorenspezifische Archive stratifizierten Korpus sowie deskriptive Statistiken am Beispiel des Udo-Lindenberg-Archivs.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Deutsch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Germanische Sprachen; Deutsch (430)
    Schlagworte: Deutsch; Popmusik; Lindenberg; Udo; Rockmusik; Liedtext; Korpus; Automatische Sprachanalyse
    Lizenz:

    creativecommons.org/licenses/by-nc-sa/4.0/deed.de ; info:eu-repo/semantics/openAccess

  2. GenitivDB - a corpus-generated database for German genitive classification
    Erschienen: 2014
    Verlag:  European Language Resources Association (ELRA)

    We present a novel NLP resource for the explanation of linguistic phenomena, built and evaluated exploring very large annotated language corpora. For the compilation, we use the German Reference Corpus (DeReKo) with more than 5 billion word forms,... mehr

     

    We present a novel NLP resource for the explanation of linguistic phenomena, built and evaluated exploring very large annotated language corpora. For the compilation, we use the German Reference Corpus (DeReKo) with more than 5 billion word forms, which is the largest linguistic resource worldwide for the study of contemporary written German. The result is a comprehensive database of German genitive formations, enriched with a broad range of intra- und extralinguistic metadata. It can be used for the notoriously controversial classification and prediction of genitive endings (short endings, long endings, zero-marker). We also evaluate the main factors influencing the use of specific endings. To get a general idea about a factor’s influences and its side effects, we calculate chi-square-tests and visualize the residuals with an association plot. The results are evaluated against a gold standard by implementing tree-based machine learning algorithms. For the statistical analysis, we applied the supervised LMT Logistic Model Trees algorithm, using the WEKA software. We intend to use this gold standard to evaluate GenitivDB, as well as to explore methodologies for a predictive genitive model.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Germanische Sprachen; Deutsch (430)
    Schlagworte: Deutsch; Genitiv; Korpus
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  3. Using a domain ontology for the semantic-statistical classification of specialist hypertexts
    Erschienen: 2015

    In this feasibility study we aim at contributing at the practical use of domain ontologies for hypertext classification by introducing an algorithm generating potential keywords. The algorithm uses structural markup information and lemmatized word... mehr

     

    In this feasibility study we aim at contributing at the practical use of domain ontologies for hypertext classification by introducing an algorithm generating potential keywords. The algorithm uses structural markup information and lemmatized word lists as well as a domain ontology on linguistics. We present the calculation and ranking of keyword candidates based on ontology relationships, word position, frequency information, and statistical significance as evidenced by log-likelihood tests. Finally, the results of our machine-driven classification are validated empirically against manually assigned keywords.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Germanische Sprachen; Deutsch (430)
    Schlagworte: Linguistische Datenverarbeitung; Wissenspräsentation; Semantisches Netz; Grammatik; Deutsch
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  4. A Functional Database Framework for Querying Very Large Multi-Layer Corpora
    Erschienen: 2015
    Verlag:  Hamburg : Universität Hamburg

    Linguistic query systems are special purpose IR applications. We present a novel state-of-the-art approach for the efficient exploitation of very large linguistic corpora, combining the advantages of relational database management systems (RDBMS)... mehr

     

    Linguistic query systems are special purpose IR applications. We present a novel state-of-the-art approach for the efficient exploitation of very large linguistic corpora, combining the advantages of relational database management systems (RDBMS) with the functional MapReduce programming model. Our implementation uses the German DEREKO reference corpus with multi-layer linguistic annotations and several types of text-specific metadata, but the proposed strategy is language-independent and adaptable to large-scale multilingual corpora.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Korpus; Information Retrieval
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  5. A hybrid approach to statistical and semantical analysis of web documents
    Erschienen: 2015
    Verlag:  Calgary, AB : Acta Press

    This paper describes a new approach to improve the analysis and categorization of web documents using statistical methods for template based clustering as well as semantical analysis based on terminological ontologies. A domain-specific environment... mehr

     

    This paper describes a new approach to improve the analysis and categorization of web documents using statistical methods for template based clustering as well as semantical analysis based on terminological ontologies. A domain-specific environment serves for prove of concept. In order to demonstrate the widespread practical benefit of our approach, we outline a combined mathematical and semantical framework for information retrieval on internet resources.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Deutsch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Information Retrieval; Online-Ressource; Semantische Analyse
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  6. Eine Ontologie für die Grammatik. Modellierung und Einsatzgebiete domänspezifischer Wissensstrukturen
    Erschienen: 2015
    Verlag:  Konstanz : Bibliothek der Universität Konstanz

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Deutsch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Linguistik (410)
    Schlagworte: Grammatik; Deutsch; Computerlinguistik; Ontologie <Wissensverarbeitung>
    Lizenz:

    creativecommons.org/licenses/by-nc-nd/3.0/de/deed.de ; info:eu-repo/semantics/openAccess

  7. Evaluating DBMS-based Access Strategies to Very Large Multi-layer Corpora
    Erschienen: 2016
    Verlag:  Paris : European Language Resources Association (ELRA)

    Linguistic query systems are special purpose IR applications. As text sizes, annotation layers, and metadata schemes of language corpora grow rapidly, performing complex searches becomes a highly computational expensive task. We evaluate several... mehr

     

    Linguistic query systems are special purpose IR applications. As text sizes, annotation layers, and metadata schemes of language corpora grow rapidly, performing complex searches becomes a highly computational expensive task. We evaluate several storage models and indexing variants in two multi-processor/multi-core environments, focusing on prototypical linguistic querying scenarios. Our aim is to reveal modeling and querying tendencies – rather than absolute benchmark results – when using a relational database management system (RDBMS) and MapReduce for natural language corpus retrieval. Based on these findings, we are going to improve our approach for the efficient exploitation of very large corpora, combining advantages of state-of-the-art database systems with decomposition/parallelization strategies. Our reference implementation uses the German DeReKo reference corpus with currently more than 4 billion word forms, various multi-layer linguistic annotations, and several types of text-specific metadata. The proposed strategy is language-independent and adaptable to large-scale multilingual corpora.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Linguistik (410)
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  8. Re-designing Online Terminology Resources for German Grammar. Project Report

    The compilation of terminological vocabularies plays a central role in the organization and retrieval of scientific texts. Both simple keyword lists as well as sophisticated modellings of relationships between terminological concepts can make a most... mehr

     

    The compilation of terminological vocabularies plays a central role in the organization and retrieval of scientific texts. Both simple keyword lists as well as sophisticated modellings of relationships between terminological concepts can make a most valuable contribution to the analysis, classification, and finding of appropriate digital documents, either on the Web or within local repositories. This seems especially true for long-established scientific fields with various theoretical and historical branches, such as linguistics, where the use of terminology within documents from different origins is sometimes far from being consistent. In this short paper, we report on the early stages of a project that aims at the re-design of an existing domain-specific KOS for grammatical content grammis. In particular, we deal with the terminological part of grammis and present the state-of-the-art of this online resource as well as the key re-design principles. Further, we propose questions regarding ramifications of the Linked Open Data and Semantic Web approaches for our re-design decisions.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Linguistik (410)
    Schlagworte: Terminologie; Informationsmanagement; Linguistik; Grammatik
    Lizenz:

    creativecommons.org/licenses/by-nc/3.0/de/deed.de ; info:eu-repo/semantics/openAccess

  9. A Corpus Linguistic Perspective on Contemporary German Pop Lyrics with the Multi-Layer Annotated "Songkorpus"
    Erschienen: 2020
    Verlag:  Paris : European Language Resources Association

    Song lyrics can be considered as a text genre that has features of both written and spoken discourse, and potentially provides extensive linguistic and cultural information to scientists from various disciplines. However, pop songs play a rather... mehr

     

    Song lyrics can be considered as a text genre that has features of both written and spoken discourse, and potentially provides extensive linguistic and cultural information to scientists from various disciplines. However, pop songs play a rather subordinate role in empirical language research so far - most likely due to the absence of scientifically valid and sustainable resources. The present paper introduces a multiply annotated corpus of German lyrics as a publicly available basis for multidisciplinary research. The resource contains three types of data for the investigation and evaluation of quite distinct phenomena: TEI-compliant song lyrics as primary data, linguistically and literary motivated annotations, and extralinguistic metadata. It promotes empirically/statistically grounded analyses of genre-specific features, systemic-structural correlations and tendencies in the texts of contemporary pop music. The corpus has been stratified into thematic and author-specific archives; the paper presents some basic descriptive statistics, as well as the public online frontend with its built-in evaluation forms and live visualisations.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Korpus; Lyrics <Lyrik>; Popmusik; Sprachvariante; Forschungsdaten; Deutsch
    Lizenz:

    creativecommons.org/licenses/by-nc/4.0/ ; info:eu-repo/semantics/openAccess

  10. Data-driven identification of idioms in song lyrics
    Erschienen: 2021
    Verlag:  Stroudsburg : Association for Computational Linguistics ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    The automatic recognition of idioms poses a challenging problem for NLP applications. Whereas native speakers can intuitively handle multiword expressions whose compositional meanings are hard to trace back to individual word semantics, there is... mehr

     

    The automatic recognition of idioms poses a challenging problem for NLP applications. Whereas native speakers can intuitively handle multiword expressions whose compositional meanings are hard to trace back to individual word semantics, there is still ample scope for improvement regarding computational approaches. We assume that idiomatic constructions can be characterized by gradual intensities of semantic non-compositionality, formal fixedness, and unusual usage context, and introduce a number of measures for these characteristics, comprising count-based and predictive collocation measures together with measures of context (un)similarity. We evaluate our approach on a manually labelled gold standard, derived from a corpus of German pop lyrics. To this end, we apply a Random Forest classifier to analyze the individual contribution of features for automatically detecting idioms, and study the trade-off between recall and precision. Finally, we evaluate the classifier on an independent dataset of idioms extracted from a list of Wikipedia idioms, achieving state-of-the art accuracy.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Phraseologie; Lyrics <Lyrik>; Automatische Spracherkennung; Automatische Sprachanalyse; Komposition <Wortbildung>; Semantik; Deutsch
    Lizenz:

    creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess

  11. Shallow context analysis for German idiom detection
    Erschienen: 2023
    Verlag:  Genf : Zenodo ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    In order to differentiate between figurative and literal usage of verb-noun combinations for the shared task on the disambiguation of German Verbal Idioms issued for KONVENS 2021, we apply and extend an approach originally developed for detecting... mehr

     

    In order to differentiate between figurative and literal usage of verb-noun combinations for the shared task on the disambiguation of German Verbal Idioms issued for KONVENS 2021, we apply and extend an approach originally developed for detecting idioms in a dataset consisting of random ngram samples. The classification is done by implementing a rather shallow, statistics-based pipeline without intensive preprocessing and examinations on the morphosyntactic and semantic level. We describe the overall approach, the differences between the original dataset and the dataset of the KONVENS task, provide experimental classification results, and analyse the individual contributions of our feature sets.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Kontextanalyse; Deutsch; Phraseologie; Datensatz; Automatische Sprachanalyse; Computerlinguistik
    Lizenz:

    creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess

  12. Fachsprachliche Terminologie: kontrastiv und theorieübergreifend
    Erschienen: 2023
    Verlag:  Bern : Peter Lang ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    Seit Mitte der 1990er Jahre wird am Institut für deutsche Sprache (IDS) in Mannheim erforscht, wie der hochkomplexe Gegenstandsbereich „Grammatik“ unter Ausnutzung hypertextueller Navigationsstrukturen wissenschaftlich fundiert und anschaulich... mehr

     

    Seit Mitte der 1990er Jahre wird am Institut für deutsche Sprache (IDS) in Mannheim erforscht, wie der hochkomplexe Gegenstandsbereich „Grammatik“ unter Ausnutzung hypertextueller Navigationsstrukturen wissenschaftlich fundiert und anschaulich vermittelt werden kann. Eine zentrale Bedeutung kommt folglich einer konsistenten, theorieübergreifenden Vernetzung sämtlicher Textinhalte zu. Um eine automatisierbare Bezugnahme zwischen mit unterschiedlichem terminologischem Vokabular formulierten, aber das gleiche sprachliche Phänomen beschreibenden Inhalten zu befördern, bildet eine onomasiologisch konzipierte Terminologiedatenbank das Rückgrat des Online-Systems. Der Beitrag beschreibt Konzeption und Aufbau der skizzierten linguistischen Fachterminologie.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Deutsch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Fachsprache; Terminologie; Leibniz-Institut für Deutsche Sprache (IDS); Grammatik; Terminologiedatenbank; Grammis
    Lizenz:

    creativecommons.org/licenses/by-nc-nd/4.0/deed.de ; info:eu-repo/semantics/openAccess

  13. Projektvorstellung – Sprachanfragen. Empirisch gestützte Erforschung von Zweifelsfällen
    Erschienen: 2023
    Verlag:  Genf : Zenodo ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    "Das im Januar 2022 gestartete Projekt "Sprachanfragen" (https://www.ids-mannheim.de/gra/projekte2/sprachanfragen/) verfolgt erstmalig das Ziel, Sprachanfragedaten zu erfassen, aufzubereiten und ein wissenschaftsöffentliches Monitorkorpus aus ihnen... mehr

     

    "Das im Januar 2022 gestartete Projekt "Sprachanfragen" (https://www.ids-mannheim.de/gra/projekte2/sprachanfragen/) verfolgt erstmalig das Ziel, Sprachanfragedaten zu erfassen, aufzubereiten und ein wissenschaftsöffentliches Monitorkorpus aus ihnen zu erstellen. Dazukommend wird eine Rechercheschnittstelle entwickelt, mit der die Sprachanfragen systematisch wissenschaftlich analysierbar gemacht werden. Das Poster gibt einen Überblick über das Projekt, zeigt erste Ergebnisse und bietet einen Ausblick auf Überlegungen zur Konzeption eines Chatbots zur automatisierten Beantwortung von Sprachanfragen." Ein Beitrag zur 9. Tagung des Verbands "Digital Humanities im deutschsprachigen Raum" - DHd 2023 Open Humanities Open Culture.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Deutsch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Datenerfassung; Datenaufbereitung; Korpus; Chatbot; Digital Humanities; Computerlinguistik; Anonymisierung; Terminologie; Annotation
    Lizenz:

    creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess