Suchergebnisse

Practice Report. A blended learning approach to teaching NLP for a DH public

Erschienen: 2023

Verlag: Aachen : CEUR-WS ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

This paper reports about current practice in a staged approach to the introduction of NLP principles and techniques for students of information science (IIM) and of international communication and translation (ICT) as part of their curricula. As most... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11617 https://ids-pub.bsz-bw.de/files/11617/Faass_Heid_Practice_Report_2017.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-116175

This paper reports about current practice in a staged approach to the introduction of NLP principles and techniques for students of information science (IIM) and of international communication and translation (ICT) as part of their curricula. As most of these students are rather not familiar with computer science or, in the case of IIM students, linguistics, we see them as comparable with students of the humanities. We follow a blended learning strategy with lectures, online materials, tutorials, and screencasts. In the first two terms, we focus on linguistics and its formalisation, NLP tools and applications are then introduced from the third term on. The lectures are combined with tutorials and - since the summer term 2017 - with a set of screencasts.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Sprachverarbeitung; Übersetzung; Unterrichtsmethode; Linguistik
Lizenz:	creativecommons.org/licenses/by/4.0/deed.de ; info:eu-repo/semantics/openAccess

Approximating the disambiguation of some German nominalizations by use of weak structural, lexical and corpus information ; Hacía la desambiguación de nominalizaciones en alemán a partir de información estructural, léxica y de corpus

Autor*in: Eberle, Kurt ; Heid, Ulrich ; Faaß, Gertrud

Erschienen: 2023

Verlag: Jaén : University of Jaén ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

Between classical symbolic word sense disambiguation (wsd) using explicit deep semantic representations of sentences and texts and statistical wsd using word co-occurrence information, there is a recent tendency towards mediating methods. Similar to... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11620 https://ids-pub.bsz-bw.de/files/11620/Eberle_Heid_Faass_Approximating_the_disambiguation_2011.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-116200

Between classical symbolic word sense disambiguation (wsd) using explicit deep semantic representations of sentences and texts and statistical wsd using word co-occurrence information, there is a recent tendency towards mediating methods. Similar to so-called lightweight semantics (Marek, 2009) we suggest to only make sparse use of semantic information. We describe an approximation model based upon flat underspecified discourse representation structures (FUDRSs, cf. Eberle, 2004) that weighs knowledge about context structure, lexical semantic restrictions and interpretation preferences. We give a catalogue of guidelines for human annotation of texts by corresponding indicators. Using this, the reliability of an analysis tool that implements the model can be tested with respect to annotation precision and disambiguation prediction and how both can be improved by bootstrapping the knowledge of the system using corpus information. For the balanced test corpus considered the recognition rate of the preferred reading is 80-90% (depending on the smoothing of parse errors). ; Entre el método clásico y simbólico de desambiguación de sentidos (WSD) que utiliza representaciones semánticas profundas de oraciones y textos, y el método estadístico que utiliza información relativa a la co-ocurrencia de palabras, existe una tendencia reciente a usar métodos híbridos. De manera similar a la llamada semántica light-weight (Marek, 2009), en este artículo se propone hacer uso de escasa información semántica. Describimos un modelo de aproximación sobre la base de Flat Underspecified Discourse Representation Structures (FUDRSs, cf. Eberle 2004) que valora conocimiento sobre estructura contextual, restricciones de semántica léxica e interpretaciones preferenciales. Presentamos una guía de anotación para la anotación por humanos de textos con los correspondientes indicadores. Mediante su uso, la fiabilidad de la herramienta que implementa el modelo puede ser testada con respecto a la precisión de anotación y a la predicción de ...

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Aufsatz aus einer Zeitschrift
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Nominalisierung; Deutsch; Annotation; Ambiguität; Interpretative Semantik; Kontext
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Interactive, dynamic electronic dictionaries for text production

Autor*in: Prinsloo, D.J. ; Heid, Ulrich ; Bothma, Theo ; Faaß, Gertrud

Erschienen: 2023

Verlag: Ljubljana : Trojina, Institute for Applied Slovene Studies ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

An interactive, dynamic electronic dictionary aimed at text production should guide the user in innovative ways, especially in respect of difficult, complicated or confusing issues. This paper proposes a design for bilingual dictionaries intended to... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11629 https://ids-pub.bsz-bw.de/files/11629/Faass_Interactive_dynamic_electronic_dictionaries_2011.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-116297

An interactive, dynamic electronic dictionary aimed at text production should guide the user in innovative ways, especially in respect of difficult, complicated or confusing issues. This paper proposes a design for bilingual dictionaries intended to guide users in text production; we focus on complex phenomena of the interaction between lexis and grammar. It will be argued that a dictionary aimed at guiding the user in lexical selection should implement a type of “decision algorithm”. In addition, it should flag incorrect solutions and should warn against possible wrong generalisations of (foreign) language learners. Our proposals will be illustrated with examples from several languages, as the design principles are generally applicable. The copulative construction which is regarded as the most complicated grammatical structure in Northern Sotho will be analyzed in more detail and presented as a case in point.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Elektronisches Wörterbuch; Textproduktion; Zweisprachigkeit; Grammatik; Technologie
Lizenz:	creativecommons.org/licenses/by-sa/4.0/ ; info:eu-repo/semantics/openAccess

Devices for information presentation in electronic dictionaries ; Inligtingsaanbiedingsinstrumente in elektroniese woordeboeke

Autor*in: Prinsloo, D.J. ; Heid, Ulrich ; Bothma, Theo ; Faaß, Gertrud

Erschienen: 2023

Verlag: Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

Electronic dictionaries should support dictionary users by giving them guidance in text production and text reception, alongside a user-definable offer of lexicographic data for cognitive purposes. In this article, we sketch the principles of an... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11740 https://ids-pub.bsz-bw.de/files/11740/Prinsloo_Heid_Devices_for_information_presentation_2012.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-117408 https://doi.org/10.5788/22-1-1009

Electronic dictionaries should support dictionary users by giving them guidance in text production and text reception, alongside a user-definable offer of lexicographic data for cognitive purposes. In this article, we sketch the principles of an interactive and dynamic electronic dictionary aimed at text production and text reception guiding users in innovative ways, especially with respect to difficult, complicated or confusing issues. The lexicographer has to do a very careful analysis of the nature of the possible problems to suggest an optimal solution for a specific problem. We are of the opinion that there are numerous complex situations where users need more detailed support than currently available in e-dictionaries, enabling them to make valid and correct choices. For highly complex situations, we suggest guidance through a decision tree-like device. We assume that the solutions proposed here are not specific to one language only but can, after careful analysis, be applied to e-dictionaries in different languages across the world. ; Elektroniese woordeboeke behoort woordeboekgebruikers te ondersteun deur hulle te lei ten opsigte van teksproduksie en teksresepsie volgens 'n gebruikergedefinieerde aanbod van leksikografiese data vir kognitiewe doeleindes. In hierdie artikel skets ons die beginsels waarop 'n interaktiewe en dinamiese elektroniese woordeboek berus, gemik op teksproduksie en teksresepsie wat die gebruikers op innoverende wyse lei, veral ten opsigte van moeilike, gekompliseerde of verwarrende aspekte. Die leksikograaf is genoodsaak om 'n noukeurige analise te doen van die aard van moontlike probleme ten einde 'n optimale oplossing aan te bied vir 'n spesifieke probleem. Ons is van mening dat daar verskeie komplekse gevalle bestaan waar gebruikers meer gedetailleerde ondersteuning benodig as wat tans in e-woordeboeke beskikbaar is ten einde korrekte keuses te kan maak. Vir hoogs problematiese situasies stel ons leiding deur middel van 'n keuseboom-tipe instrument voor. Ons veronderstel dat die ...

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Aufsatz aus einer Zeitschrift
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Elektronisches Wörterbuch; Wörterbuch; Lexikografie; Entscheidungsbaum; Benutzerführung; Verwandtschaftsbezeichnung
Lizenz:	creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess

Corpus-based identification and disambiguation of reading indicators for German nominalizations

Autor*in: Eberle, Kurt ; Faaß, Gertrud ; Heid, Ulrich

Erschienen: 2023

Verlag: Liverpool : University of Liverpool ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

Corpus data is often structurally and lexically ambiguous; corpus extraction methodologies thus must be made aware of ambiguities. Therefore, given an extraction task, all relevant ambiguities must be identified. To resolve these ambiguities,... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11780 https://ids-pub.bsz-bw.de/files/11780/Faass_Corpus_based_identification_2009.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-117800

Corpus data is often structurally and lexically ambiguous; corpus extraction methodologies thus must be made aware of ambiguities. Therefore, given an extraction task, all relevant ambiguities must be identified. To resolve these ambiguities, contextual data responsible for one or another reading is to be considered. In the context of our present work, German -ung-nominalizations and their sortal readings are under examination. A number of these nominalizations may be read as an event or a result, depending on the semantic group they belong to. Here, we concentrate on nominalizations of verbs of saying (henceforth: "verba dicendi"), identify their context partners and their influence on the sortal reading of the nominalizations in question. We present a tool which calculates the sortal reading of such nominalizations and thus may improve not only corpus extraction, but also e.g. machine translation. Lastly, we describe successful attempts to identify the correct sortal reading, conclusions and future work.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Nominalisierung; Deutsch; Ambiguität; Korpus; Indikator; Implementation
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Design and application of a Gold Standard for morphological analysis: SMOR as an example of morphological evaluation

Autor*in: Faaß, Gertrud ; Heid, Ulrich ; Schmid, Helmut

Erschienen: 2023

Verlag: Luxemburg : European Language Resources Association ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

This paper describes general requirements for evaluating and documenting NLP tools with a focus on morphological analysers and the design of a Gold Standard. It is argued that any evaluation must be measurable and documentation thereof must be made... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11876 https://ids-pub.bsz-bw.de/files/11876/Faass_Heid_Schmid_Design_and_application_2010.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-118764

This paper describes general requirements for evaluating and documenting NLP tools with a focus on morphological analysers and the design of a Gold Standard. It is argued that any evaluation must be measurable and documentation thereof must be made accessible for any user of the tool. The documentation must be of a kind that it enables the user to compare different tools offering the same service, hence the descriptions must contain measurable values. A Gold Standard presents a vital part of any measurable evaluation process, therefore, the corpus-based design of a Gold Standard, its creation and problems that occur are reported upon here. Our project concentrates on SMOR, a morphological analyser for German that is to be offered as a web-service. We not only utilize this analyser for designing the Gold Standard, but also evaluate the tool itself at the same time. Note that the project is ongoing, therefore, we cannot present final results.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Morphologie; Deutsch; Korpus; Sprachanalyse; Web Services
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Part-of-Speech tagging of Northern Sotho: Disambiguating polysemous function words

Autor*in: Faaß, Gertrud ; Heid, Ulrich ; Taljard, Elsabe ; Prinsloo, Danie

Erschienen: 2023

Verlag: Stroudsburg : Association for Computational Linguistics ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

A major obstacle to part-of-speech (=POS) tagging of Northern Sotho (Bantu, S 32) are ambiguous function words. Many are highly polysemous and very frequent in texts, and their local context is not always distinctive. With certain taggers, this issue... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11881 https://ids-pub.bsz-bw.de/files/11881/Faass_Part_of_speech_tagging_2009.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-118813

A major obstacle to part-of-speech (=POS) tagging of Northern Sotho (Bantu, S 32) are ambiguous function words. Many are highly polysemous and very frequent in texts, and their local context is not always distinctive. With certain taggers, this issue leads to comparatively poor results (between 88 and 92 % accuracy), especially when sizeable tagsets (over 100 tags) are used. We use the RF-tagger (Schmid and Laws,2008), which is particularly designed for the annotation of fine-grained tagsets (e.g. including agreement information), and we restructure the 141 tags of the tagset proposed by Taljard et al. (2008) in a way to fit the RF tagger. This leads to over 94 % accuracy. Error analysis in addition shows which types of phenomena cause trouble in the POS-tagging of Northern Sotho.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Nordsotho; Polysemie; Funktionswort; Methodologie; Bantusprachen
Lizenz:	creativecommons.org/licenses/by/4.0/deed.de ; info:eu-repo/semantics/openAccess

Designing a noun guesser for part of speech tagging in Northern Sotho

Autor*in: Heid, Ulrich ; Prinsloo, Danie J. ; Faaß, Gertrud ; Taljard, Elsabé

Erschienen: 2023

Verlag: London : Taylor & Francis ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS) [Zweitveröffentlichung]

In this article, we describe an element of a suite of computational tools for assigning word-class tags (as a preparation for part of speech (POS) tagging) to word forms in unrestricted Northern Sotho texts. POS-tagging is a step towards a linguistic... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11924 https://ids-pub.bsz-bw.de/files/11924/Heid_Prinsloo_Faass_Designing_a_noun_guesser_2009.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-119242 https://doi.org/10.1080/02572117.2009.10587313

In this article, we describe an element of a suite of computational tools for assigning word-class tags (as a preparation for part of speech (POS) tagging) to word forms in unrestricted Northern Sotho texts. POS-tagging is a step towards a linguistic analysis of the texts, which in turn allows for advanced data extraction. The tool component that is described, identifies (and classifies) noun forms. Several types of linguistic knowledge are used to recognize nouns that are not contained in the noun lexicon of the system. These include the relationship between singular and plural noun prefixes, knowledge about noun derivation, and data about the co-occurrence of the candidate with concords, pronouns and adjectives in a local context. Our implementation is a symbolic, voting-based process: together, all tests determine whether a candidate is a noun; accuracy on unseen test data is around 92%.

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Aufsatz aus einer Zeitschrift
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Pedi-Sprache; Computerlinguistik; Wortart; Substantiv; Datenanalyse
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

From to ISOTiger – community driven developments for syntax annotation in SynAF

Autor*in: Bosch, Sonja ; Eckart, Kerstin ; Faaß, Gertrud ; Heid, Ulrich ; Lee, Kiyong ; Pareja-Lora, Antonio ; Pretorius, Laurette ; Romary, Laurent ; Witt, Andreas ; Zeldes, Amir ; Zipser, Florian

Erschienen: 2023

Verlag: Tübingen : Universität Tübingen ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

In 2010, ISO published a standard for syntactic annotation, ISO 24615:2010 (SynAF). Back then, the document specified a comprehensive reference model for the representation of syntactic annotations, but no accompanying XML serialisation. ISO’s... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11925 https://ids-pub.bsz-bw.de/files/11925/Bosch_Eckart_Faass_ISOTiger_2014.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-119251

In 2010, ISO published a standard for syntactic annotation, ISO 24615:2010 (SynAF). Back then, the document specified a comprehensive reference model for the representation of syntactic annotations, but no accompanying XML serialisation. ISO’s subcommittee on language resource management (ISO TC 37/SC 4) is working on making the SynAF serialisation ISOTiger an additional part of the standard. This contribution addresses the current state of development of ISOTiger, along with a number of open issues on which we are seeking community feedback in order to ensure that ISOTiger becomes a useful extension to the SynAF reference model.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Syntax; Annotation; Standardisierung; Texttechnologie
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Nachhaltige Dokumentation virtueller Forschungsumgebungen

Autor*in: Faaß, Gertrud ; Heid, Ulrich

Erschienen: 2023

Verlag: Glückstadt : Werner Hülsbusch ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

In den letzten Jahren werden immer mehr virtuelle Forschungsumgebungen für die maschinelle Sprachverarbeitung zur Verfügung gestellt. Diese sollten zum einen nachhaltig und zum anderen für potenzielle Nutzer vergleichbar dokumentiert werden. In... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/12010 https://ids-pub.bsz-bw.de/files/12010/Faass_Heid_Nachhaltige_Dokumentation_2011.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-120106 https://doi.org/10.5281/zenodo.4134528

In den letzten Jahren werden immer mehr virtuelle Forschungsumgebungen für die maschinelle Sprachverarbeitung zur Verfügung gestellt. Diese sollten zum einen nachhaltig und zum anderen für potenzielle Nutzer vergleichbar dokumentiert werden. In diesem Beitrag werden daher Bedingungen für die Nachhaltigkeit insbesondere von NLP- (Natural Language Processing) Werk-zeugen beschrieben: Die Dokumentation sollte nicht nur die Software, son-dern auch ihre Evaluierung anhand einer – ebenfalls gut dokumentierten – Testsuite umfassen. Im Beitrag werden auch Möglichkeiten dargestellt, den Dokumentationsvorgang selbst anhand von DocBook XML zu automatisieren. ; hroughout the last years, an increasing number of virtual research environ-ments have been offered in the field of Natural Language Processing (NLP). These should be documented in a sustainable way that also guarantees com-parability for potential users. This paper thus describes constraints for the sustainability of NLP-environments: the documentation must describe not only the software from the developer’s view, but also its evaluation accor-ding to a testsuite, which is itself to be documented comprehensively. The paper also describes the possibility of automating the documentation proc-esses by utilizing DocBook XML.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Deutsch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Forschung; Dokumentation; Sprachverarbeitung; Web Services; Natürliche Sprache; Nachhaltigkeit
Lizenz:	creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess

Filtern nach

Aktive Filter

Kategorien:

Bereich

Quelle

Format

Beteiligt

Medientyp

Sprache

Jahr

Letzte Suchanfragen

Ergebnisse für *

Practice Report. A blended learning approach to teaching NLP for a DH public

Approximating the disambiguation of some German nominalizations by use of weak structural, lexical and corpus information ; Hacía la desambiguación de nominalizaciones en alemán a partir de información estructural, léxica y de corpus

Interactive, dynamic electronic dictionaries for text production

Devices for information presentation in electronic dictionaries ; Inligtingsaanbiedingsinstrumente in elektroniese woordeboeke

Corpus-based identification and disambiguation of reading indicators for German nominalizations

Design and application of a Gold Standard for morphological analysis: SMOR as an example of morphological evaluation

Part-of-Speech tagging of Northern Sotho: Disambiguating polysemous function words

Designing a noun guesser for part of speech tagging in Northern Sotho

From to ISOTiger – community driven developments for syntax annotation in SynAF

Nachhaltige Dokumentation virtueller Forschungsumgebungen

Kontakt

Partner