Ergebnisse für *

Es wurden 12 Ergebnisse gefunden.

Zeige Ergebnisse 1 bis 12 von 12.

Sortieren

  1. Feature-based encoding and querying language resources with character semantics
    Erschienen: 2024
    Verlag:  Paris : European Language Resources Association (ELRA) ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    In this paper we discuss the explicit representation of character features pertaining to written language resources, which we argue are critically necessary in the long term of archiving language data. Much focus on the creation of language resources... mehr

     

    In this paper we discuss the explicit representation of character features pertaining to written language resources, which we argue are critically necessary in the long term of archiving language data. Much focus on the creation of language resources and their associated preservation is at the level of the corpus itself; however it is generally accepted that long term interpretation of these language resources requires more than a best practice data format. In particular, where language resources are created in linguistic fieldwork, and especially for minority languages, the need for preservation not only of the resource itself, but of additional metadata which allows for the resource to be accurately interpreted in the future is becoming a topic of research in itself. In this paper we extend earlier work on semantically based character decomposition to include representation of character properties in a variety of models, and a mechanism for exploiting these properties through queries.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Sprachdaten; Archivierung; Metadaten; Phonetik; Ontologie <Wissensverarbeitung>; XML
    Lizenz:

    creativecommons.org/licenses/by-nc-sa/3.0/ ; info:eu-repo/semantics/openAccess

  2. A BLARK extension for temporal annotation mining
    Erschienen: 2024
    Verlag:  Paris : European Language Resources Association (ELRA) ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    The Basic Language Resource Kit (BLARK) proposed by Krauwer is designed for the creation of initial textual resources. There are a number of toolkits for the development of spoken language resources and systems, but tools for second level resources,... mehr

     

    The Basic Language Resource Kit (BLARK) proposed by Krauwer is designed for the creation of initial textual resources. There are a number of toolkits for the development of spoken language resources and systems, but tools for second level resources, that is, resources which are the result of processing primary level speech resources such as speech recordings. Typically, processing of this kind in phonetics is done manually, with the aid of spreadsheets multi-purpose statistics software. We propose a Basic Language and Speech Kit (BLAST) as an extension to BLARK and suggest a strategy for integrating the kit into the Natural Language Toolkit (NLTK). The prototype kit is evaluated in an application to examining temporal properties of spoken Brazilian Portuguese.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Annotation; Data Mining; Gesprochene Sprache; Phonetik; Rhythmus
    Lizenz:

    creativecommons.org/licenses/by-nc-sa/3.0/ ; info:eu-repo/semantics/openAccess

  3. Building a historical corpus for Classical Portuguese: some technological aspects
    Erschienen: 2024
    Verlag:  Paris : European Language Resources Association (ELRA) ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    This paper describes the restructuring process of a large corpus of historical documents and the system architecture that is used for accessing it. The initial challenge of this process was to get the most out of existing material, normalizing the... mehr

     

    This paper describes the restructuring process of a large corpus of historical documents and the system architecture that is used for accessing it. The initial challenge of this process was to get the most out of existing material, normalizing the legacy markup and harvesting the inherent information using widely available standards. This resulted in a conceptual and technical restructuring of the formerly existing corpus. The development of the standardized markup and techniques allowed the inclusion of important new materials, such as original 16th and 17th century prints and manuscripts; and enlarged the potential user groups. On the technological side, we were grounded on the premise that open standards are the best way of making sure that the resources will be accessible even after years in an archive. This is a welcomed result in view of the additional consequence of the remodeled corpus concept: it serves as a repository for important historical documents, some of which had been preserved for 500 years in paper format. This very rich material can from now on be handled freely for linguistic research goals.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Korpus; Portugiesisch; Archivierung; Annotation; Metadaten; Sprachdaten; Computerlinguistik
    Lizenz:

    creativecommons.org/licenses/by-nc-sa/3.0/ ; info:eu-repo/semantics/openAccess

  4. CoGesT: a formal transcription system for conversational gesture
    Erschienen: 2024
    Verlag:  Paris : European Language Resources Association (ELRA) ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    In order to create reusable and sustainable multimodal resources a transcription model for hand and arm gestures in conversation is needed. We argue that transcription systems so far developed for sign language transcription and psychological... mehr

     

    In order to create reusable and sustainable multimodal resources a transcription model for hand and arm gestures in conversation is needed. We argue that transcription systems so far developed for sign language transcription and psychological analysis are not suitable for the linguistic analysis of conversational gesture. Such a model must adhere to a strict form-function distinction and be both computationally explicit and compatible with descriptive notations such as feature structures in other areas of computational and descriptive linguistics. We describe the development and evaluation of a suitable formal model using a feature-based transcription system, concentrating as a first step on arm gestures within the context of the development of an annotated video resource and gesture lexicon.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Transkription; Körpersprache; Gespräch; Gestik; Computerlinguistik; Annotation; Konversationsanalyse
    Lizenz:

    creativecommons.org/licenses/by-nc-sa/3.0/ ; info:eu-repo/semantics/openAccess

  5. Consistent storage of metadata in inference lexica: the MetaLex approach
    Erschienen: 2024
    Verlag:  Paris : European Language Resources Association (ELRA) ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    With MetaLex we introduce a framework for metadata management where information can be inferred from different areas of metadata coding, such as metadata for catalogue descriptions, linguistic levels, or tiers. This is done for consistency and... mehr

     

    With MetaLex we introduce a framework for metadata management where information can be inferred from different areas of metadata coding, such as metadata for catalogue descriptions, linguistic levels, or tiers. This is done for consistency and efficiency in metadata recording and applies the same inference techniques that are used for lexical inference. For this purpose we motivate the need for metadata descriptions on all document levels, describe the different structures of metadata, use existing metadata recommendations on different levels of annotations, and show a usecase of metadata inference.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Metadaten; Schlussfolgern; Lexikon; Annotation; Computerlinguistik
    Lizenz:

    creativecommons.org/licenses/by-nc-sa/3.0/ ; info:eu-repo/semantics/openAccess

  6. Annotation driven concordancing: the PAX toolkit
    Erschienen: 2024
    Verlag:  Paris : European Language Resources Association (ELRA) ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    We describe PAX, "Portable Audio Concordance System", a proof-of-concept prototype of a multipurpose, multilingual audio concordance toolkit. The primary goal is to support efficient grammar and lexicon construction in the documentation of unwritten... mehr

     

    We describe PAX, "Portable Audio Concordance System", a proof-of-concept prototype of a multipurpose, multilingual audio concordance toolkit. The primary goal is to support efficient grammar and lexicon construction in the documentation of unwritten languages; languages currently included are Ega, Anyi, and Koulango (Ivory Coast), additional samples in German and English. The approach combines methods from corpus linguistics, annotation theory and practice, phonetics and lexicography.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Annotation; Konkordanz; Korpus; Phonetik; Lexikografie; XML; Gesprochene Sprache; Multimodales System
    Lizenz:

    creativecommons.org/licenses/by-nc-sa/3.0/ ; info:eu-repo/semantics/openAccess

  7. A multi-view hyperlexicon resource for speech and language system development
    Erschienen: 2024
    Verlag:  Paris : European Language Resources Association (ELRA) ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    New generations of integrated multimodal speech and language systems with dictation, readback or talking face facilities require multiple sources of lexical information for development and evaluation. Recent developments in hyperlexicon development... mehr

     

    New generations of integrated multimodal speech and language systems with dictation, readback or talking face facilities require multiple sources of lexical information for development and evaluation. Recent developments in hyperlexicon development offer new perspectives for the development of such resources which are at the same time practically useful, computationally feasible, and theoretically well-founded. We describe the specification, three-level lexical document design principles, and implementation of a MARTIF document structure and several presentation structures for a terminological lexicon, including both on demand access and full hypertext lexicon compilation. The underlying resource is a relational lexical database with SQL querying and access via a CGI internet interface. This resource is mapped on to the hypergraph structure which defines the macrostructure of the hyperlexicon.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: SGML; XML; Multimodalität; Datenbank; Computerlinguistik
    Lizenz:

    creativecommons.org/licenses/by-nc-sa/3.0/ ; info:eu-repo/semantics/openAccess

  8. Sprachressourcen in der Standardisierung
    Erschienen: 2024
    Verlag:  Hildesheim : Gesellschaft für Sprachtechnologie und Computerlinguistik ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    Wir berichten über internationale Normungsarbeit im Bereich von Sprachressourcen. Die Normen werden von internationalen Arbeitsgruppen im Rahmen der International Organization for Standardization(ISO) entwickelt und jeweils national von... mehr

     

    Wir berichten über internationale Normungsarbeit im Bereich von Sprachressourcen. Die Normen werden von internationalen Arbeitsgruppen im Rahmen der International Organization for Standardization(ISO) entwickelt und jeweils national von entsprechenden Gruppen, in Deutschland koordiniert vom Deutschen Institut für Normung (DIN), begleitet und diskutiert. Für die automatische Sprachverarbeitung besteht seit Jahren zunehmend Bedarf an elektronischen Ressourcen: Lexika, Korpora, Grammatiken, Annotationskonventionen, Sprachdatensammlungen, usw. Damit solche Ressourcen über einen einzelnen Anwendungskontext hinaus wiederverwertbar sind und zwischen Arbeitsgruppen ausgetauscht werden können, wird an einer Normung ihrer Repräsentationsformate und der zur Beschreibung von Ressourceninhalten benutzbaren Vokabularien gearbeitet (Datenkategorien). Waren in der Vergangenheit Standardisierungsbemühungen auf bestimmte Ausschnitte aus dem Spektrum der linguistischen Beschreibungen von Ressourcen beschränkt(z.B. die EU-Projekte SAM im Bereich gesprochener Sprache, EAGLES und ISLE im Bereich von Morphosyntax, Syntax, lexikalischer Semantik in Texten und Lexika und Sprachtechnologie), so ist die Zielsetzung der 2002 und 2003 gegründeten ISO (TC37SC4) bzw. DIN (NAT AA6) Arbeitsgruppenbreiter: es geht um Metarichtlinien für die Repräsentation und Annotation von Texten ebenso wie um Datenkategorien für Lexika, morphologische und morphosyntaktische Analyse, usw. Wir beschreiben den aktuellen Stand der Normungsdiskussion.

     

    Export in Literaturverwaltung   RIS-Format
      BibTeX-Format
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Deutsch
    Medientyp: Aufsatz aus einer Zeitschrift
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Standardisierung; Sprachverarbeitung; Annotation; Daten; Online-Ressource
    Lizenz:

    creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess

  9. Metadata for time aligned corpora
    Erschienen: 2024
    Verlag:  Luxemburg : European Language Resources Association ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    For a detailed description of time aligned corpora, for example spoken language corpora and multimodal corpora, specific metadata categories are necessary, extending the scope of traditional metadata categories. We argue that it is necessary to allow... mehr

     

    For a detailed description of time aligned corpora, for example spoken language corpora and multimodal corpora, specific metadata categories are necessary, extending the scope of traditional metadata categories. We argue that it is necessary to allow metadata on all levels of annotation, i.e. on a general level for catalogues, on the session level for each recording, on the annotation level for multi tier score annotation, even on the level of individual annotation segments. We use existing standards where they allow this distinction and introduce metadata categories for the layer level.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Metadaten; Korpus; Gesprochene Sprache; Annotation; Multimodales System
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  10. Managing linguistic resources by enriching their metadata with linked data
    Erschienen: 2024
    Verlag:  Koblenz : Universität Koblenz ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    The NaLiDa project aims at contributing to an infrastructure for the metadata-based description and access to linguistic resources and tools. When aggregating heterogenous metadata sets from various providers to provide a single and uniform point of... mehr

     

    The NaLiDa project aims at contributing to an infrastructure for the metadata-based description and access to linguistic resources and tools. When aggregating heterogenous metadata sets from various providers to provide a single and uniform point of access to the aggregation, data curation becomes a central issue. In this paper, we describe how we use authority files from the German National Library, available as Linked Data, to tackle this issue for metadata fields about persons, organisations, and subject classifications.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Metadaten; Ressourcen; Datensatz; Infrastruktur; Sprachdaten
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  11. The computational semantics of characters
    Erschienen: 2024
    Verlag:  University : Tilburg University ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    In this paper we present a new approach to the computational semantics of characters, which fills this gap: the orthographic projection of linguistic information, analogous to phonetic interpretation. We consider a number of use cases prior to... mehr

     

    In this paper we present a new approach to the computational semantics of characters, which fills this gap: the orthographic projection of linguistic information, analogous to phonetic interpretation. We consider a number of use cases prior to discussion of three different perspectives. Adopting a holistic view of semantics, we discover that there are properties at this lower level which require similar specification to that at more well-studied levels, and which can coherently extend computational linguistic models to the domain of orthography.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Sprachdaten; Semantik; Modell; Zeichen; Computerlinguistik
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  12. Unlocking the corpus: enriching metadata with state-of-the-art NLP methodology and linked data
    Erschienen: 2024
    Verlag:  Utrecht : CLARIN ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    In research data management, descriptive metadata are indispensable to describing data and are a key element in preparing data according to the FAIR principles (Wilkinson et al., 2016). Extracting semantic metadata from textual research data is... mehr

     

    In research data management, descriptive metadata are indispensable to describing data and are a key element in preparing data according to the FAIR principles (Wilkinson et al., 2016). Extracting semantic metadata from textual research data is currently not part of most metadata workflows, even more so if a research data set can be subdivided into smaller parts, such as a newspaper corpus containing multiple newspaper articles. Our approach is to add semantic metadata at the text level to facilitate the search over data. We show how to enrich metadata with three NLP methods: named entity recognition, keyword extraction, and topic modeling. The goal is to make it possible to search for texts that are about certain topics or described by certain keywords, or to identify people, places, and organisations mentioned in texts without actually having to read them and at the same time facilitate the creation of task-tailored subcorpora. To enhance this usability of the data we explore options based on the German Reference Corpus DeReKo, the largest linguistically motivated collection of German language material (Kupietz & Keibel, 2009; Kupietz et al., 2010, 2018), which contains multiple newspapers, books, transcriptions, etc., and enrich its metadata on the level of subportions, i.e. newspaper articles. We received access to a number of data files in DeReKo’s native XML format, I5. To develop the methodology, we focus on a single XML file containing all issues of one newspaper of a whole year. The following sections only give an overview of our approach, we intend, however, to provide a detailed description of the experiments and the selection of data in a subsequent longer contribution.

     

    Export in Literaturverwaltung   RIS-Format
      BibTeX-Format
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Aufsatz aus einem Sammelband
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Korpus; Metadaten; Natürliche Sprache; Computerlinguistik; Datenmanagement; Named Entity Recognition; Deutsch; XML
    Lizenz:

    creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess