Ergebnisse für *

Es wurden 7 Ergebnisse gefunden.

Zeige Ergebnisse 1 bis 7 von 7.

Sortieren

  1. Datenübernahmerichtlinien des Leibniz-Instituts für Deutsche Sprache
  2. Datenübernahmerichtlinien des Leibniz-Instituts für Deutsche Sprache
    Erschienen: 2019
    Verlag:  Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Deutsch
    Medientyp: Unbestimmt
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Datenschutz; Forschungsdaten; Korpus
    Lizenz:

    creativecommons.org/licenses/by-sa/4.0/deed.de ; info:eu-repo/semantics/openAccess

  3. Linguistic and computational modeling in language science
    Autor*in: Teich, Elke
    Erschienen: 2019

    Leibniz-Institut für Deutsche Sprache (IDS), Bibliothek
    keine Fernleihe
    Export in Literaturverwaltung   RIS-Format
      BibTeX-Format
    Quelle: Leibniz-Institut für Deutsche Sprache, Bibliothek
    Beteiligt: Fankhauser, Peter (VerfasserIn)
    Sprache: Englisch
    Medientyp: Aufsatz aus einem Sammelband
    Format: Druck
    Übergeordneter Titel: Enthalten in: The shape of data in the digital humanities; London : Routledge, Taylor & Francis Group, 2019; (2019), Seite [236]-249; xviii, 341 Seiten

  4. OCR post-correction of the Royal Society Corpus based on the noisy channel model
    Erschienen: 2019
    Verlag:  Bremen : Deutsche Gesellschaft für Sprachwissenschaft

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: OCR-Schrift; Korrektur; Automatische Sprachverarbeitung
    Lizenz:

    creativecommons.org/licenses/by-nc-sa/4.0/deed.de ; info:eu-repo/semantics/openAccess

  5. OCR Nachkorrektur des Royal Society Corpus
    Erschienen: 2019
    Verlag:  Frankfurt am Main : Zenodo

    We present an approach for automatic detection and correction of OCR-induced misspellings in historical texts. The main objective is the post-correction of the digitized Royal Society Corpus, a set of historical documents from 1665 to 1869. Due to... mehr

     

    We present an approach for automatic detection and correction of OCR-induced misspellings in historical texts. The main objective is the post-correction of the digitized Royal Society Corpus, a set of historical documents from 1665 to 1869. Due to the aged material the OCR procedure has made mistakes, thus leading to files corrupted by thousands of misspellings. This motivates a post processing step. The current correction technique is a pattern-based approach which due to its lack of generalization suffers from bad recall. To generalize from the patterns we propose to use the noisy channel model. From the pattern based substitutions we train a corpus specific error model complemented with a language model. With an F1-Score of 0.61 the presented technique significantly outperforms the pattern based approach which has an F1-score of 0.28. Due to its more accurate error model it also outperforms other implementations of the noisy channel model.

     

    Export in Literaturverwaltung   RIS-Format
      BibTeX-Format
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Aufsatz aus einem Sammelband
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: OCR-Schrift; Korrektur; Automatische Sprachverarbeitung; Digital Humanities
    Lizenz:

    creativecommons.org/licenses/by-nd/4.0/ ; info:eu-repo/semantics/openAccess

  6. What's New in EuReCo? Interoperability, Comparable Corpora, Licensing
    Erschienen: 2019
    Verlag:  Mannheim : Leibniz-Institut für Deutsche Sprache

    This paper reports on the latest developments of the European Reference Corpus EuReCo and the German Reference Corpus in relation to three of the most important CMLC topics: interoperability, collaboration on corpus infrastructure building, and legal... mehr

     

    This paper reports on the latest developments of the European Reference Corpus EuReCo and the German Reference Corpus in relation to three of the most important CMLC topics: interoperability, collaboration on corpus infrastructure building, and legal issues. Concerning interoperability, we present new ways to access DeReKo via KorAP on the API and on the plugin level. In addition we report about advancements in the EuReCo- and ICC-initiatives with the provision of comparable corpora, and about recent problems with license acquisitions and our solution approaches using an indemnification clause and model licenses that include scientific exploitation.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Korpus
    Lizenz:

    creativecommons.org/licenses/by/4.0/deed.de ; info:eu-repo/semantics/openAccess

  7. Analyzing domain specific word embeddings for a large corpus of contemporary German. International Corpus Linguistics Conference, Cardiff, Wales, UK, July 22-26, 2019
    Erschienen: 2019
    Verlag:  Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    Distributional models of word use constitute an indispensable tool in corpus based lexicological research for discovering paradigmatic relations and syntagmatic patterns (Belica et al. 2010). Recently, word embeddings (Mikolov et al. 2013) have... mehr

     

    Distributional models of word use constitute an indispensable tool in corpus based lexicological research for discovering paradigmatic relations and syntagmatic patterns (Belica et al. 2010). Recently, word embeddings (Mikolov et al. 2013) have revived the field by allowing to construct and analyze distributional models on very large corpora. This is accomplished by reducing the very high dimensionality of word cooccurrence contexts, the size of the vocabulary, to few dimensions, such as 100-200. However, word use and meaning can vary widely along dimensions such as domain, register, and time, and word embeddings tend to represent only the most prevalent meaning. In this paper we thus construct domain specific word embeddings to allow for systematically analyzing variations in word use. Moreover, we also demonstrate how to reconstruct domain specific co-occurrence contexts from the dense word embeddings.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Korpus; Phrase <Syntagma>; Automatische Sprachanalyse; Deutsch
    Lizenz:

    creativecommons.org/licenses/by/4.0/deed.de ; info:eu-repo/semantics/openAccess