Ergebnisse für *

Es wurden 10 Ergebnisse gefunden.

Zeige Ergebnisse 1 bis 10 von 10.

Sortieren

  1. Practice Report. A blended learning approach to teaching NLP for a DH public
    Erschienen: 2023
    Verlag:  Aachen : CEUR-WS ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    This paper reports about current practice in a staged approach to the introduction of NLP principles and techniques for students of information science (IIM) and of international communication and translation (ICT) as part of their curricula. As most... mehr

     

    This paper reports about current practice in a staged approach to the introduction of NLP principles and techniques for students of information science (IIM) and of international communication and translation (ICT) as part of their curricula. As most of these students are rather not familiar with computer science or, in the case of IIM students, linguistics, we see them as comparable with students of the humanities. We follow a blended learning strategy with lectures, online materials, tutorials, and screencasts. In the first two terms, we focus on linguistics and its formalisation, NLP tools and applications are then introduced from the third term on. The lectures are combined with tutorials and - since the summer term 2017 - with a set of screencasts.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Sprachverarbeitung; Übersetzung; Unterrichtsmethode; Linguistik
    Lizenz:

    creativecommons.org/licenses/by/4.0/deed.de ; info:eu-repo/semantics/openAccess

  2. Approximating the disambiguation of some German nominalizations by use of weak structural, lexical and corpus information ; Hacía la desambiguación de nominalizaciones en alemán a partir de información estructural, léxica y de corpus
    Erschienen: 2023
    Verlag:  Jaén : University of Jaén ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    Between classical symbolic word sense disambiguation (wsd) using explicit deep semantic representations of sentences and texts and statistical wsd using word co-occurrence information, there is a recent tendency towards mediating methods. Similar to... mehr

     

    Between classical symbolic word sense disambiguation (wsd) using explicit deep semantic representations of sentences and texts and statistical wsd using word co-occurrence information, there is a recent tendency towards mediating methods. Similar to so-called lightweight semantics (Marek, 2009) we suggest to only make sparse use of semantic information. We describe an approximation model based upon flat underspecified discourse representation structures (FUDRSs, cf. Eberle, 2004) that weighs knowledge about context structure, lexical semantic restrictions and interpretation preferences. We give a catalogue of guidelines for human annotation of texts by corresponding indicators. Using this, the reliability of an analysis tool that implements the model can be tested with respect to annotation precision and disambiguation prediction and how both can be improved by bootstrapping the knowledge of the system using corpus information. For the balanced test corpus considered the recognition rate of the preferred reading is 80-90% (depending on the smoothing of parse errors). ; Entre el método clásico y simbólico de desambiguación de sentidos (WSD) que utiliza representaciones semánticas profundas de oraciones y textos, y el método estadístico que utiliza información relativa a la co-ocurrencia de palabras, existe una tendencia reciente a usar métodos híbridos. De manera similar a la llamada semántica light-weight (Marek, 2009), en este artículo se propone hacer uso de escasa información semántica. Describimos un modelo de aproximación sobre la base de Flat Underspecified Discourse Representation Structures (FUDRSs, cf. Eberle 2004) que valora conocimiento sobre estructura contextual, restricciones de semántica léxica e interpretaciones preferenciales. Presentamos una guía de anotación para la anotación por humanos de textos con los correspondientes indicadores. Mediante su uso, la fiabilidad de la herramienta que implementa el modelo puede ser testada con respecto a la precisión de anotación y a la predicción de ...

     

    Export in Literaturverwaltung   RIS-Format
      BibTeX-Format
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Aufsatz aus einer Zeitschrift
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Nominalisierung; Deutsch; Annotation; Ambiguität; Interpretative Semantik; Kontext
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  3. Interactive, dynamic electronic dictionaries for text production
    Erschienen: 2023
    Verlag:  Ljubljana : Trojina, Institute for Applied Slovene Studies ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    An interactive, dynamic electronic dictionary aimed at text production should guide the user in innovative ways, especially in respect of difficult, complicated or confusing issues. This paper proposes a design for bilingual dictionaries intended to... mehr

     

    An interactive, dynamic electronic dictionary aimed at text production should guide the user in innovative ways, especially in respect of difficult, complicated or confusing issues. This paper proposes a design for bilingual dictionaries intended to guide users in text production; we focus on complex phenomena of the interaction between lexis and grammar. It will be argued that a dictionary aimed at guiding the user in lexical selection should implement a type of “decision algorithm”. In addition, it should flag incorrect solutions and should warn against possible wrong generalisations of (foreign) language learners. Our proposals will be illustrated with examples from several languages, as the design principles are generally applicable. The copulative construction which is regarded as the most complicated grammatical structure in Northern Sotho will be analyzed in more detail and presented as a case in point.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Elektronisches Wörterbuch; Textproduktion; Zweisprachigkeit; Grammatik; Technologie
    Lizenz:

    creativecommons.org/licenses/by-sa/4.0/ ; info:eu-repo/semantics/openAccess

  4. Devices for information presentation in electronic dictionaries ; Inligtingsaanbiedingsinstrumente in elektroniese woordeboeke
    Erschienen: 2023
    Verlag:  Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    Electronic dictionaries should support dictionary users by giving them guidance in text production and text reception, alongside a user-definable offer of lexicographic data for cognitive purposes. In this article, we sketch the principles of an... mehr

     

    Electronic dictionaries should support dictionary users by giving them guidance in text production and text reception, alongside a user-definable offer of lexicographic data for cognitive purposes. In this article, we sketch the principles of an interactive and dynamic electronic dictionary aimed at text production and text reception guiding users in innovative ways, especially with respect to difficult, complicated or confusing issues. The lexicographer has to do a very careful analysis of the nature of the possible problems to suggest an optimal solution for a specific problem. We are of the opinion that there are numerous complex situations where users need more detailed support than currently available in e-dictionaries, enabling them to make valid and correct choices. For highly complex situations, we suggest guidance through a decision tree-like device. We assume that the solutions proposed here are not specific to one language only but can, after careful analysis, be applied to e-dictionaries in different languages across the world. ; Elektroniese woordeboeke behoort woordeboekgebruikers te ondersteun deur hulle te lei ten opsigte van teksproduksie en teksresepsie volgens 'n gebruikergedefinieerde aanbod van leksikografiese data vir kognitiewe doeleindes. In hierdie artikel skets ons die beginsels waarop 'n interaktiewe en dinamiese elektroniese woordeboek berus, gemik op teksproduksie en teksresepsie wat die gebruikers op innoverende wyse lei, veral ten opsigte van moeilike, gekompliseerde of verwarrende aspekte. Die leksikograaf is genoodsaak om 'n noukeurige analise te doen van die aard van moontlike probleme ten einde 'n optimale oplossing aan te bied vir 'n spesifieke probleem. Ons is van mening dat daar verskeie komplekse gevalle bestaan waar gebruikers meer gedetailleerde ondersteuning benodig as wat tans in e-woordeboeke beskikbaar is ten einde korrekte keuses te kan maak. Vir hoogs problematiese situasies stel ons leiding deur middel van 'n keuseboom-tipe instrument voor. Ons veronderstel dat die ...

     

    Export in Literaturverwaltung   RIS-Format
      BibTeX-Format
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Aufsatz aus einer Zeitschrift
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Elektronisches Wörterbuch; Wörterbuch; Lexikografie; Entscheidungsbaum; Benutzerführung; Verwandtschaftsbezeichnung
    Lizenz:

    creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess

  5. Corpus-based identification and disambiguation of reading indicators for German nominalizations
    Erschienen: 2023
    Verlag:  Liverpool : University of Liverpool ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    Corpus data is often structurally and lexically ambiguous; corpus extraction methodologies thus must be made aware of ambiguities. Therefore, given an extraction task, all relevant ambiguities must be identified. To resolve these ambiguities,... mehr

     

    Corpus data is often structurally and lexically ambiguous; corpus extraction methodologies thus must be made aware of ambiguities. Therefore, given an extraction task, all relevant ambiguities must be identified. To resolve these ambiguities, contextual data responsible for one or another reading is to be considered. In the context of our present work, German -ung-nominalizations and their sortal readings are under examination. A number of these nominalizations may be read as an event or a result, depending on the semantic group they belong to. Here, we concentrate on nominalizations of verbs of saying (henceforth: "verba dicendi"), identify their context partners and their influence on the sortal reading of the nominalizations in question. We present a tool which calculates the sortal reading of such nominalizations and thus may improve not only corpus extraction, but also e.g. machine translation. Lastly, we describe successful attempts to identify the correct sortal reading, conclusions and future work.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Nominalisierung; Deutsch; Ambiguität; Korpus; Indikator; Implementation
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  6. Design and application of a Gold Standard for morphological analysis: SMOR as an example of morphological evaluation
    Erschienen: 2023
    Verlag:  Luxemburg : European Language Resources Association ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    This paper describes general requirements for evaluating and documenting NLP tools with a focus on morphological analysers and the design of a Gold Standard. It is argued that any evaluation must be measurable and documentation thereof must be made... mehr

     

    This paper describes general requirements for evaluating and documenting NLP tools with a focus on morphological analysers and the design of a Gold Standard. It is argued that any evaluation must be measurable and documentation thereof must be made accessible for any user of the tool. The documentation must be of a kind that it enables the user to compare different tools offering the same service, hence the descriptions must contain measurable values. A Gold Standard presents a vital part of any measurable evaluation process, therefore, the corpus-based design of a Gold Standard, its creation and problems that occur are reported upon here. Our project concentrates on SMOR, a morphological analyser for German that is to be offered as a web-service. We not only utilize this analyser for designing the Gold Standard, but also evaluate the tool itself at the same time. Note that the project is ongoing, therefore, we cannot present final results.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Morphologie; Deutsch; Korpus; Sprachanalyse; Web Services
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  7. Part-of-Speech tagging of Northern Sotho: Disambiguating polysemous function words
    Erschienen: 2023
    Verlag:  Stroudsburg : Association for Computational Linguistics ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    A major obstacle to part-of-speech (=POS) tagging of Northern Sotho (Bantu, S 32) are ambiguous function words. Many are highly polysemous and very frequent in texts, and their local context is not always distinctive. With certain taggers, this issue... mehr

     

    A major obstacle to part-of-speech (=POS) tagging of Northern Sotho (Bantu, S 32) are ambiguous function words. Many are highly polysemous and very frequent in texts, and their local context is not always distinctive. With certain taggers, this issue leads to comparatively poor results (between 88 and 92 % accuracy), especially when sizeable tagsets (over 100 tags) are used. We use the RF-tagger (Schmid and Laws,2008), which is particularly designed for the annotation of fine-grained tagsets (e.g. including agreement information), and we restructure the 141 tags of the tagset proposed by Taljard et al. (2008) in a way to fit the RF tagger. This leads to over 94 % accuracy. Error analysis in addition shows which types of phenomena cause trouble in the POS-tagging of Northern Sotho.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Nordsotho; Polysemie; Funktionswort; Methodologie; Bantusprachen
    Lizenz:

    creativecommons.org/licenses/by/4.0/deed.de ; info:eu-repo/semantics/openAccess

  8. Designing a noun guesser for part of speech tagging in Northern Sotho
    Erschienen: 2023
    Verlag:  London : Taylor & Francis ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS) [Zweitveröffentlichung]

    In this article, we describe an element of a suite of computational tools for assigning word-class tags (as a preparation for part of speech (POS) tagging) to word forms in unrestricted Northern Sotho texts. POS-tagging is a step towards a linguistic... mehr

     

    In this article, we describe an element of a suite of computational tools for assigning word-class tags (as a preparation for part of speech (POS) tagging) to word forms in unrestricted Northern Sotho texts. POS-tagging is a step towards a linguistic analysis of the texts, which in turn allows for advanced data extraction. The tool component that is described, identifies (and classifies) noun forms. Several types of linguistic knowledge are used to recognize nouns that are not contained in the noun lexicon of the system. These include the relationship between singular and plural noun prefixes, knowledge about noun derivation, and data about the co-occurrence of the candidate with concords, pronouns and adjectives in a local context. Our implementation is a symbolic, voting-based process: together, all tests determine whether a candidate is a noun; accuracy on unseen test data is around 92%.

     

    Export in Literaturverwaltung   RIS-Format
      BibTeX-Format
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Aufsatz aus einer Zeitschrift
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Pedi-Sprache; Computerlinguistik; Wortart; Substantiv; Datenanalyse
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  9. From to ISOTiger – community driven developments for syntax annotation in SynAF
    Erschienen: 2023
    Verlag:  Tübingen : Universität Tübingen ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    In 2010, ISO published a standard for syntactic annotation, ISO 24615:2010 (SynAF). Back then, the document specified a comprehensive reference model for the representation of syntactic annotations, but no accompanying XML serialisation. ISO’s... mehr

     

    In 2010, ISO published a standard for syntactic annotation, ISO 24615:2010 (SynAF). Back then, the document specified a comprehensive reference model for the representation of syntactic annotations, but no accompanying XML serialisation. ISO’s subcommittee on language resource management (ISO TC 37/SC 4) is working on making the SynAF serialisation ISOTiger an additional part of the standard. This contribution addresses the current state of development of ISOTiger, along with a number of open issues on which we are seeking community feedback in order to ensure that ISOTiger becomes a useful extension to the SynAF reference model.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Syntax; Annotation; Standardisierung; Texttechnologie
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  10. Nachhaltige Dokumentation virtueller Forschungsumgebungen
    Erschienen: 2023
    Verlag:  Glückstadt : Werner Hülsbusch ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    In den letzten Jahren werden immer mehr virtuelle Forschungsumgebungen für die maschinelle Sprachverarbeitung zur Verfügung gestellt. Diese sollten zum einen nachhaltig und zum anderen für potenzielle Nutzer vergleichbar dokumentiert werden. In... mehr

     

    In den letzten Jahren werden immer mehr virtuelle Forschungsumgebungen für die maschinelle Sprachverarbeitung zur Verfügung gestellt. Diese sollten zum einen nachhaltig und zum anderen für potenzielle Nutzer vergleichbar dokumentiert werden. In diesem Beitrag werden daher Bedingungen für die Nachhaltigkeit insbesondere von NLP- (Natural Language Processing) Werk-zeugen beschrieben: Die Dokumentation sollte nicht nur die Software, son-dern auch ihre Evaluierung anhand einer – ebenfalls gut dokumentierten – Testsuite umfassen. Im Beitrag werden auch Möglichkeiten dargestellt, den Dokumentationsvorgang selbst anhand von DocBook XML zu automatisieren. ; hroughout the last years, an increasing number of virtual research environ-ments have been offered in the field of Natural Language Processing (NLP). These should be documented in a sustainable way that also guarantees com-parability for potential users. This paper thus describes constraints for the sustainability of NLP-environments: the documentation must describe not only the software from the developer’s view, but also its evaluation accor-ding to a testsuite, which is itself to be documented comprehensively. The paper also describes the possibility of automating the documentation proc-esses by utilizing DocBook XML.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Deutsch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Forschung; Dokumentation; Sprachverarbeitung; Web Services; Natürliche Sprache; Nachhaltigkeit
    Lizenz:

    creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess