Ergebnisse für *

Es wurden 5 Ergebnisse gefunden.

Zeige Ergebnisse 1 bis 5 von 5.

Sortieren

  1. Abfrage und Analyse von Korpusbelegen
    Erschienen: 2022
    Verlag:  Paderborn : Wilhelm Fink ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    In diesem Kapitel stellen wir zunächst grundlegende Konzepte von Abfragesystemen und Abfragesprachen für die Suche in Korpora vor. Diese Konzepte sollen Ihnen helfen, die einzelnen Abfragesprachen besser zu verstehen und vergleichen zu können. Die... mehr

     

    In diesem Kapitel stellen wir zunächst grundlegende Konzepte von Abfragesystemen und Abfragesprachen für die Suche in Korpora vor. Diese Konzepte sollen Ihnen helfen, die einzelnen Abfragesprachen besser zu verstehen und vergleichen zu können. Die gängigen Abfragesprachen unterscheiden sich in vielen Details. Diese Details und die Möglichkeiten und Grenzen der einzelnen Abfragesprachen stellen wir im zweiten Teil mit vielen Beispielaufgaben und dazu passenden Lösungen in jeweils drei Abfragesprachen vor.

     

    Export in Literaturverwaltung   RIS-Format
      BibTeX-Format
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Deutsch
    Medientyp: Aufsatz aus einem Sammelband
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Sprachdaten; Korpus; Abfragesprache; Datenbank; Forschungsdaten
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  2. Matrix and double-array representations for efficient finite state tokenization
    Autor*in: Diewald, Nils
    Erschienen: 2022
    Verlag:  Paris : European Language Resources Association (ELRA) ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    This paper presents an algorithm and an implementation for efficient tokenization of texts of space-delimited languages based on a deterministic finite state automaton. Two representations of the underlying data structure are presented and a model... mehr

     

    This paper presents an algorithm and an implementation for efficient tokenization of texts of space-delimited languages based on a deterministic finite state automaton. Two representations of the underlying data structure are presented and a model implementation for German is compared with state-of-the-art approaches. The presented solution is faster than other tools while maintaining comparable quality.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Algorithmus; Endlicher Zustandsraum; Datenstruktur; Deutsch; Korpus
    Lizenz:

    creativecommons.org/licenses/by-nc/4.0/ ; info:eu-repo/semantics/openAccess

  3. Tokenizing on scale. Preprocessing large text corpora on the lexical and sentence level
    Erschienen: 2022
    Verlag:  Mannheim : IDS-Verlag

    When comparing different tools in the field of natural language processing (NLP), the quality of their results usually has first priority. This is also true for tokenization. In the context of large and diverse corpora for linguistic research... mehr

     

    When comparing different tools in the field of natural language processing (NLP), the quality of their results usually has first priority. This is also true for tokenization. In the context of large and diverse corpora for linguistic research purposes, however, other criteria also play a role – not least sufficient speed to process the data in an acceptable amount of time. In this paper we evaluate several state-ofthe-art tokenization tools for German – including our own – with regard to theses criteria. We conclude that while not all tools are applicable in this setting, no compromises regarding quality need to be made.

     

    Export in Literaturverwaltung   RIS-Format
      BibTeX-Format
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Aufsatz aus einem Sammelband
    Format: Online
    DDC Klassifikation: Englisch, Altenglisch (420)
    Schlagworte: Korpus; Software; Automatische Sprachanalyse; Daten; Deutsch
    Lizenz:

    creativecommons.org/licenses/by-sa/3.0/de/deed.de ; info:eu-repo/semantics/openAccess

  4. Tokenizing on scale. Preprocessing large text corpora on the lexical and sentence level
    Erschienen: 2022
    Verlag:  Mannheim : IDS-Verlag ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    When comparing different tools in the field of natural language processing (NLP), the quality of their results usually has first priority. This is also true for tokenization. In the context of large and diverse corpora for linguistic research... mehr

     

    When comparing different tools in the field of natural language processing (NLP), the quality of their results usually has first priority. This is also true for tokenization. In the context of large and diverse corpora for linguistic research purposes, however, other criteria also play a role – not least sufficient speed to process the data in an acceptable amount of time. In this paper we evaluate several state of the art tokenization tools for German – including our own – with regard to theses criteria. We conclude that while not all tools are applicable in this setting, no compromises regarding quality need to be made.

     

    Export in Literaturverwaltung   RIS-Format
      BibTeX-Format
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Aufsatz aus einem Sammelband
    Format: Online
    DDC Klassifikation: Englisch, Altenglisch (420)
    Schlagworte: Korpus
    Lizenz:

    creativecommons.org/licenses/by-sa/4.0/deed.de ; info:eu-repo/semantics/openAccess

  5. Building paths to corpus data. A multi-level least effort and maximum return approach
    Erschienen: 2022
    Verlag:  Berlin/Boston : de Gruyter ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    Enabling appropriate access to linguistic research data, both for many researchers and for innovative research applications, is a challenging task. In this chapter, we describe how we address this challenge in the context of the German Reference... mehr

     

    Enabling appropriate access to linguistic research data, both for many researchers and for innovative research applications, is a challenging task. In this chapter, we describe how we address this challenge in the context of the German Reference Corpus DeReKo and the corpus analysis platform KorAP. The core of our approach, which is based on and tightly integrated into the CLARIN infrastructure, is to offer access at different levels. The graduated access levels make it possible to find a low-loss compromise between the possibilities opened up and the costs incurred by users and providers for each individual use case, so that, viewed over many applications, the ratio between effort and results achieved can be effectively optimized. We also report on experiences with the current state of this approach.

     

    Export in Literaturverwaltung   RIS-Format
      BibTeX-Format
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Aufsatz aus einem Sammelband
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Sprachdaten; Deutsches Referenzkorpus (DeReKo); Korpus; Technische Infrastruktur; Nachhaltigkeit
    Lizenz:

    creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess