Filtern nach
Letzte Suchanfragen

Ergebnisse für *

Es wurden 46 Ergebnisse gefunden.

Zeige Ergebnisse 1 bis 25 von 46.

Sortieren

  1. Towards a treatment of register phenomena in HPSG
    Erschienen: 2023
    Verlag:  Frankfurt am Main : Universitätsbibliothek Frankfurt am Main ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    In this paper, we deal with register-driven variation from a probabilistic perspective, as proposed in Schäfer, Bildhauer, Pankratz, Müller (2022). We compare two approaches to analyse this variation within HPSG. On the one hand, we consider a... mehr

     

    In this paper, we deal with register-driven variation from a probabilistic perspective, as proposed in Schäfer, Bildhauer, Pankratz, Müller (2022). We compare two approaches to analyse this variation within HPSG. On the one hand, we consider a multiple-grammar approach and combine it with the architecture proposed in the CoreGram project Müller (2015) - discussing its advantages and disadvantages. On the other hand, we take into account a single-grammar approach and argue that it appears to be superior due to its computational efficiency and cognitive plausibility.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Head-driven phrase structure grammar; Phrasenstrukturgrammatik; Grammatik; Register
    Lizenz:

    creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess

  2. Semantische Suche mit Word Embeddings für ein mehrsprachiges Wörterbuchportal
    Erschienen: 2023
    Verlag:  Potsdam : Zenodo ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    Das Lehnwortportal Deutsch (LWPD) ist ein Online-Informationssystem zu Entlehnungen von Wörtern aus dem Deutschen in andere Sprachen. Es beruht auf einer wachsenden Zahl von lexikographischen Ressourcen zu verschiedenen Sprachen und bietet eine... mehr

     

    Das Lehnwortportal Deutsch (LWPD) ist ein Online-Informationssystem zu Entlehnungen von Wörtern aus dem Deutschen in andere Sprachen. Es beruht auf einer wachsenden Zahl von lexikographischen Ressourcen zu verschiedenen Sprachen und bietet eine einfache ressourcenübergreifende Suchfunktion an. Das Poster präsentiert eine derzeit in Entwicklung befindliche onomasiologische Suchfunktion für das LWPD.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Deutsch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Digital Humanities; Mehrsprachiges Wörterbuch; Programmierung; Lehnwort
    Lizenz:

    creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess

  3. Metadata formats for learner corpora: case study and discussion
    Erschienen: 2023
    Verlag:  Linköping : LiU Electronic Press ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    Metadata provides important information relevant both to finding and understanding corpus data. Meaningful linguistic data requires both reasonable annotations and documentation of these annotations. This documentation is part of the metadata of a... mehr

     

    Metadata provides important information relevant both to finding and understanding corpus data. Meaningful linguistic data requires both reasonable annotations and documentation of these annotations. This documentation is part of the metadata of a dataset. While corpus documentation has often been provided in the form of accompanying publications, machinereadable metadata, both containing the bibliographic information and documenting the corpus data, has many advantages. Metadata standards allow for the development of common tools and interfaces. In this paper I want to add a new perspective from an archive’s point of view and look at the metadata provided for four learner corpora and discuss the suitability of established standards for machine-readable metadata. I am are aware that there is ongoing work towards metadata standards for learner corpora. However, I would like to keep the discussion going and add another point of view: increasing findability and reusability of learner corpora in an archiving context.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Metadaten; Korpus; Computerlinguistik; Annotation; Dokumentation; Datensatz; Archivierung
    Lizenz:

    creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess

  4. RefCo and its checker: improving language documentation corpora’s reusability through a semi-automatic review process
    Erschienen: 2023
    Verlag:  Paris : European Language Resources Association (ELRA) ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    The QUEST (QUality ESTablished) project aims at ensuring the reusability of audio-visual datasets (Wamprechtshammer et al., 2022) by devising quality criteria and curating processes. RefCo (Reference Corpora) is an initiative within QUEST in... mehr

     

    The QUEST (QUality ESTablished) project aims at ensuring the reusability of audio-visual datasets (Wamprechtshammer et al., 2022) by devising quality criteria and curating processes. RefCo (Reference Corpora) is an initiative within QUEST in collaboration with DoReCo (Documentation Reference Corpus, Paschen et al. (2020)) focusing on language documentation projects. Previously, Aznar and Seifart (2020) introduced a set of quality criteria dedicated to documenting fieldwork corpora. Based on these criteria, we establish a semi-automatic review process for existing and work-in-progress corpora, in particular for language documentation. The goal is to improve the quality of a corpus by increasing its reusability. A central part of this process is a template for machine-readable corpus documentation and automatic data verification based on this documentation. In addition to the documentation and automatic verification, the process involves a human review and potentially results in a RefCo certification of the corpus. For each of these steps, we provide guidelines and manuals. We describe the evaluation process in detail, highlight the current limits for automatic evaluation and how the manual review is organized accordingly.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Korpus; Dokumentation; Datensatz; Zertifizierung; Richtlinie; Sprachdaten; Gesprochene Sprache; Annotation; Computerlinguistik
    Lizenz:

    creativecommons.org/licenses/by-nc/4.0/ ; info:eu-repo/semantics/openAccess

  5. Konnektoren als Mittel des interaktiven Ausgleichs kommunikativer Wissensasymmetrien und der Gesprächsorganisation: das Beispiel von und
    Erschienen: 2023
    Verlag:  München : iudicium ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS) [Zweitveröffentlichung]

    Im vorliegenden Beitrag soll gezeigt werden, wie Konnektoren als sprachliche Mittel zur Aktualisierung von zwei Arten konversationeller Aktivitäten eingesetzt werden können, nämlich von intersubjektiven bzw. gesprächsorganisatorischen Verfahren. Auf... mehr

     

    Im vorliegenden Beitrag soll gezeigt werden, wie Konnektoren als sprachliche Mittel zur Aktualisierung von zwei Arten konversationeller Aktivitäten eingesetzt werden können, nämlich von intersubjektiven bzw. gesprächsorganisatorischen Verfahren. Auf intersubjektive Verfahren greift ein Sprecher zurück, um in Kooperation mit seinem Gesprächspartner einen gemeinsamen Wissenshintergrund (common ground) zu schaffen. Durch gesprächsorganisatorische Verfahren greift der Sprecher in die gesprächsthematische Struktur des Interaktionsgeschehens ein. In diesem Beitrag wird die Aktualisierung dieser beiden konversationellen Verfahren am Beispiel der kommunikativen Gattung autobiographisches Interview betrachtet. Diese Gattung ist für eine solche Analyse m. E. besonders geeignet, denn sie zeichnet sich durch eine relativ scharfe Trennung der Gesprächsrollen aus, die das Nachvollziehen des Interaktionsgeschehens erleichtert. An einem autobiographischen Interview sind zwei Subjekte beteiligt: der Interviewte, der als Wissensträger gilt, und der Interviewer, der durch seine Rolle als Gesprächsleiter die Wissensvermittlung begünstigen soll. Der Interviewer ist also mit einer zweifachen Aufgabe konfrontiert, denn er muss die anfängliche Wissensasymmetrie ausgleichen und ist zugleich für die Gesprächsorganisation zuständig. Im Folgenden soll am Beispiel des Konjunktors und veranschaulicht werden, wie der Gebrauch von Konnektoren zur Bewältigung dieser beiden kommunikativen Aufgaben beitragen kann.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Deutsch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Logische Partikel; Interaktion; Kommunikation; Gespräch; Biografisches Interview; Interview; Biografie
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  6. The “linguistic landscape” method as a tool in research and education of multilingualism: experiences from a project in the Baltic States
    Erschienen: 2023
    Verlag:  Uppsala : Acta Universitatis Upsaliensis ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    “Linguistic Landscapes” (LL) is a research method which has become increasingly popular in recent years. In this paper, we will first explain the method itself and discuss some of its fundamental assumptions. We will then recall the basic traits of... mehr

     

    “Linguistic Landscapes” (LL) is a research method which has become increasingly popular in recent years. In this paper, we will first explain the method itself and discuss some of its fundamental assumptions. We will then recall the basic traits of multilingualism in the Baltic States, before presenting results from our project carried out together with a group of Master students of Philology in several medium-sized towns in the Baltic States, focussing on our home town of Rēzekne in the highly multilingual region of Latgale in Eastern Latvia. In the discussion of some of the results, we will introduce the concept of “Legal Hypercorrection” as a term for the stricter compliance of language laws than necessary. The last part will report on advantages of LL for educational purposes of multilingualism, and for developing discussions on multilingualism among the general public.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Linguistic Landscape; Mehrsprachigkeit; Baltikum; Rēzekne; Lettgallen; Bildung; Hyperkorrektur; Sprachpolitik
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  7. Kommunikative Abweichungen als Störungen in Sport-Interviews im Ukrainischen und Deutschen
    Erschienen: 2023
    Verlag:  Bern : Peter Lang ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    Das Ziel des Beitrags ist es, die Merkmale von Kommunikationsstörungen in Sport-Interviews aus Sicht der Interviewten festzustellen und zu analysieren. Die empirische Forschungsbasis besteht aus ukrainisch- und deutschsprachigen Videointerviews aus... mehr

     

    Das Ziel des Beitrags ist es, die Merkmale von Kommunikationsstörungen in Sport-Interviews aus Sicht der Interviewten festzustellen und zu analysieren. Die empirische Forschungsbasis besteht aus ukrainisch- und deutschsprachigen Videointerviews aus den Jahren 2010 bis 2019, die entweder im Fernsehen gesendet oder für YouTube produziert wurden. Die Ergebnisse der Studie ermöglichten es, die charakteristischen Merkmale von Abweichungen als Kommunikationsstörungen in Sport-Interviews auf drei Ebenen der kommunikativen Gattung zu identifizieren: auf der außenstrukturellen, binnenstrukturellen und situativen Ebene. Sowohl gemeinsame Merkmale von Kommunikationsstörungen als auch Unterschiede in den ukrainisch- und deutschsprachigen Sport-Interviews wurden bestimmt. Die Ergebnisse der Studie zeigen, dass die Arten von Kommunikationsstörungen in Sport-Interviews im Ukrainischen und Deutschen universell sind, sie spiegeln jedoch die nationalen und kulturellen Besonderheiten angesichts der Merkmale beider Sprachen und jeder Sprachkultur wider.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Deutsch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Kommunikation; Interview; Sport; Ukrainisch; Deutsch; Kommunikationsstörung; Videointerview
    Lizenz:

    creativecommons.org/licenses/by-nc-nd/4.0/deed.de ; info:eu-repo/semantics/openAccess

  8. Redewiedergabe in Heftromanen und Hochliteratur
    Erschienen: 2023
    Verlag:  Paderborn : Zenodo ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    Die vorgestellte Studie untersucht die Anteile unterschiedlicher Redewiedergabeformen im Vergleich zwischen zwei Literaturtypen von gegensätzlichen Enden des Spektrums: Hochliteratur – definiert als Werke, die auf der Auswahlliste von... mehr

     

    Die vorgestellte Studie untersucht die Anteile unterschiedlicher Redewiedergabeformen im Vergleich zwischen zwei Literaturtypen von gegensätzlichen Enden des Spektrums: Hochliteratur – definiert als Werke, die auf der Auswahlliste von Literaturpreisen standen – und Heftromanen, massenproduzierten Erzählwerken, die zumeist über den Zeitschriftenhandel vertrieben werden und früher abwertend als „Romane der Unterschicht” (Nusser 1981) bezeichnet wurden. Unsere These ist, dass sich diese Literaturtypen hinsichtlich ihrer Erzählweise unterscheiden, und sich dies in den verwendeten Wiedergabeformen niederschlägt. Der Fokus der Untersuchung liegt auf der Dichotomie zwischen direkter und nicht-direkter Wiedergabe, die schon in der klassischen Rhetorik aufgemacht wurde.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Deutsch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Hochliteratur; Romanheft; Erzähltechnik; Annotation; Volltext
    Lizenz:

    creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess

  9. To BERT or not to BERT – Comparing contextual embeddings in a deep learning architecture for the automatic recognition of four types of speech, thought and writing representation
    Erschienen: 2023
    Verlag:  Aachen : CEUR-WS ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    We present recognizers for four very different types of speech, thought and writing representation (STWR) for German texts. The implementation is based on deep learning with two different customized contextual embeddings, namely FLAIR embeddings and... mehr

     

    We present recognizers for four very different types of speech, thought and writing representation (STWR) for German texts. The implementation is based on deep learning with two different customized contextual embeddings, namely FLAIR embeddings and BERT embeddings. This paper gives an evaluation of our recognizers with a particular focus on the differences in performance we observed between those two embeddings. FLAIR performed best for direct STWR (F1=0.85), BERT for indirect (F1=0.76) and free indirect (F1=0.59) STWR. For reported STWR, the comparison was inconclusive, but BERT gave the best average results and best individual model (F1=0.60). Our best recognizers, our customized language embeddings and most of our test and training data are freely available and can be found via www.redewiedergabe.de or at github.com/redewiedergabe.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Einbettung; Deutsch; Testdaten; Textanalyse
    Lizenz:

    creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess

  10. Towards a gold standard corpus for detecting valencies of Zulu verbs
    Erschienen: 2023
    Verlag:  München [u.a.] : German Society for Computational Linguistics & Language Technology und Friedrich-Alexander-Universität Erlangen-Nürnberg ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    We report on a new project building a Natural Language Processing resource for Zulu by making use of resources already available. Combining tagging results with the results of morphological analysis semi-automatically, we expect to reduce the amount... mehr

     

    We report on a new project building a Natural Language Processing resource for Zulu by making use of resources already available. Combining tagging results with the results of morphological analysis semi-automatically, we expect to reduce the amount of manual work when generating a finely-grained gold standard corpus usable for training a tagger. From the tagged corpus, we plan to extract verb-argument pairs with the aim of compiling a verb valency lexicon for Zulu.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Zulu-Sprache; Sprachverarbeitung; Korpus; Morphologie; Wortschatz
    Lizenz:

    creativecommons.org/licenses/by-nc-sa/4.0/deed.de ; info:eu-repo/semantics/openAccess

  11. Practice Report. A blended learning approach to teaching NLP for a DH public
    Erschienen: 2023
    Verlag:  Aachen : CEUR-WS ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    This paper reports about current practice in a staged approach to the introduction of NLP principles and techniques for students of information science (IIM) and of international communication and translation (ICT) as part of their curricula. As most... mehr

     

    This paper reports about current practice in a staged approach to the introduction of NLP principles and techniques for students of information science (IIM) and of international communication and translation (ICT) as part of their curricula. As most of these students are rather not familiar with computer science or, in the case of IIM students, linguistics, we see them as comparable with students of the humanities. We follow a blended learning strategy with lectures, online materials, tutorials, and screencasts. In the first two terms, we focus on linguistics and its formalisation, NLP tools and applications are then introduced from the third term on. The lectures are combined with tutorials and - since the summer term 2017 - with a set of screencasts.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Sprachverarbeitung; Übersetzung; Unterrichtsmethode; Linguistik
    Lizenz:

    creativecommons.org/licenses/by/4.0/deed.de ; info:eu-repo/semantics/openAccess

  12. Towards an integrated E-Dictionary application – The case of an English to Zulu dictionary of possessives
    Erschienen: 2023
    Verlag:  Bozen : EURAC Research ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    This paper describes a first version of an integrated e-dictionary translating possessive constructions from English to Zulu. Zulu possessive constructions are difficult to learn for non-mother tongue speakers. When translating from English into... mehr

     

    This paper describes a first version of an integrated e-dictionary translating possessive constructions from English to Zulu. Zulu possessive constructions are difficult to learn for non-mother tongue speakers. When translating from English into Zulu, a speaker needs to be acquainted with the nominal classification of nouns indicating possession and possessor. Furthermore, (s)he needs to be informed about the morpho-syntactic rules associated with certain combinations of noun classes. Lastly, knowledge of morpho-phonetic changes is also required, because these influence the orthography of the output word forms. Our approach is a novel one in that we combine e-lexicography and natural language processing by developing a (web) interface supporting learners, as well as other users of the dictionary to produce Zulu possessive constructions. The final dictionary that we intend to develop will contain several thousand nouns which users can combine as they wish. It will also translate single words and frequently used multiword expressions, and allow users to test their own translations. On request, information about the morpho-syntactic and morpho-phonetic rules applied by the system are displayed together with the translation. Our approach follows the function theory: the dictionary supports users in text production, at the same time fulfilling a cognitive function.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Übersetzung; Elektronisches Wörterbuch; Zulu-Sprache; Possessivpronomen; Morphosyntax; Implementation
    Lizenz:

    creativecommons.org/licenses/by-nc-sa/3.0/ ; info:eu-repo/semantics/openAccess

  13. Interactive, dynamic electronic dictionaries for text production
    Erschienen: 2023
    Verlag:  Ljubljana : Trojina, Institute for Applied Slovene Studies ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    An interactive, dynamic electronic dictionary aimed at text production should guide the user in innovative ways, especially in respect of difficult, complicated or confusing issues. This paper proposes a design for bilingual dictionaries intended to... mehr

     

    An interactive, dynamic electronic dictionary aimed at text production should guide the user in innovative ways, especially in respect of difficult, complicated or confusing issues. This paper proposes a design for bilingual dictionaries intended to guide users in text production; we focus on complex phenomena of the interaction between lexis and grammar. It will be argued that a dictionary aimed at guiding the user in lexical selection should implement a type of “decision algorithm”. In addition, it should flag incorrect solutions and should warn against possible wrong generalisations of (foreign) language learners. Our proposals will be illustrated with examples from several languages, as the design principles are generally applicable. The copulative construction which is regarded as the most complicated grammatical structure in Northern Sotho will be analyzed in more detail and presented as a case in point.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Elektronisches Wörterbuch; Textproduktion; Zweisprachigkeit; Grammatik; Technologie
    Lizenz:

    creativecommons.org/licenses/by-sa/4.0/ ; info:eu-repo/semantics/openAccess

  14. Building NLP resources for Dzongkha: A tagset and a tagged corpus
    Erschienen: 2023
    Verlag:  Beijing : Coling 2010 Organizing Committee ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    This paper describes the application of probabilistic part of speech taggers to the Dzongkha language. A tag set containing 66 tags is designed, which is based on the Penn Treebank. A training corpus of 40,247 tokens is utilized to train the model.... mehr

     

    This paper describes the application of probabilistic part of speech taggers to the Dzongkha language. A tag set containing 66 tags is designed, which is based on the Penn Treebank. A training corpus of 40,247 tokens is utilized to train the model. Using the lexicon extracted from the training corpus and lexicon from the available word list, we used two statistical taggers for comparison reasons. The best result achieved was 93.1% accuracy in a 10-fold cross validation on the training set. The winning tagger was thereafter applied to annotate a 570,247 token corpus.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Dzongkha; Korpus; Daten; Sprachverarbeitung; Text-to-Speech
    Lizenz:

    creativecommons.org/licenses/by-nc-sa/3.0/ ; info:eu-repo/semantics/openAccess

  15. The verbal phrase of Northern Sotho: A morpho-syntactic perspective
    Erschienen: 2023
    Verlag:  Luxemburg : European Language Resources Association ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    So far, comprehensive grammar descriptions of Northern Sotho have only been available in the form of prescriptive books aiming at teaching the language. This paper describes parts of the first morpho-syntactic description of Northern Sotho from a... mehr

     

    So far, comprehensive grammar descriptions of Northern Sotho have only been available in the form of prescriptive books aiming at teaching the language. This paper describes parts of the first morpho-syntactic description of Northern Sotho from a computational perspective (Faaß, 2010a). Such a description is necessary for implementing rule based, operational grammars. It is also essential for the annotation of training data to be utilised by statistical parsers. The work that we partially present here may hence provide a resource for computational processing of the language in order to proceed with producing linguistic representations beyond tagging, may it be chunking or parsing. The paper begins with describing significant Northern Sotho verbal morpho-syntactics (section 2). It is shown that the topology of the verb can be depicted as a slot system which may form the basis for computational processing (section 3). Note that the implementation of the described rules (section 4) and also coverage tests are ongoing processes upon that we will report in more detail at a later stage.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Nordsotho; Sotho-Sprache; Morphosyntax; Chunking; Verbalphrase
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  16. Corpus-based identification and disambiguation of reading indicators for German nominalizations
    Erschienen: 2023
    Verlag:  Liverpool : University of Liverpool ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    Corpus data is often structurally and lexically ambiguous; corpus extraction methodologies thus must be made aware of ambiguities. Therefore, given an extraction task, all relevant ambiguities must be identified. To resolve these ambiguities,... mehr

     

    Corpus data is often structurally and lexically ambiguous; corpus extraction methodologies thus must be made aware of ambiguities. Therefore, given an extraction task, all relevant ambiguities must be identified. To resolve these ambiguities, contextual data responsible for one or another reading is to be considered. In the context of our present work, German -ung-nominalizations and their sortal readings are under examination. A number of these nominalizations may be read as an event or a result, depending on the semantic group they belong to. Here, we concentrate on nominalizations of verbs of saying (henceforth: "verba dicendi"), identify their context partners and their influence on the sortal reading of the nominalizations in question. We present a tool which calculates the sortal reading of such nominalizations and thus may improve not only corpus extraction, but also e.g. machine translation. Lastly, we describe successful attempts to identify the correct sortal reading, conclusions and future work.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Nominalisierung; Deutsch; Ambiguität; Korpus; Indikator; Implementation
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  17. Recent developments in the European Reference Corpus EuReCo
    Erschienen: 2023
    Verlag:  Louvain-la-Neuve : Presses universitaires de Louvain ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    This paper reports on recent developments within the European Reference Corpus EuReCo, an open initiative that aims at providing and using virtual and dynamically definable comparable corpora based on existing national, reference or other large... mehr

     

    This paper reports on recent developments within the European Reference Corpus EuReCo, an open initiative that aims at providing and using virtual and dynamically definable comparable corpora based on existing national, reference or other large corpora. Given the well-known shortcomings of other types of multilingual corpora such as parallel/translation corpora (shining-through effects, over-normalization, simplification, etc.) or web-based comparable corpora (covering only web material), EuReCo provides a unique linguistic resource offering new perspectives for fine-grained contrastive research on authentic cross-linguistic data, applications in translation studies and foreign language teaching and learning.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Korpus; Forschungsdaten; Sprachdaten; Kontrastive Linguistik; Übersetzungswissenschaft; Fremdsprachenunterricht; Fremdsprachenlernen
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  18. Design and application of a Gold Standard for morphological analysis: SMOR as an example of morphological evaluation
    Erschienen: 2023
    Verlag:  Luxemburg : European Language Resources Association ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    This paper describes general requirements for evaluating and documenting NLP tools with a focus on morphological analysers and the design of a Gold Standard. It is argued that any evaluation must be measurable and documentation thereof must be made... mehr

     

    This paper describes general requirements for evaluating and documenting NLP tools with a focus on morphological analysers and the design of a Gold Standard. It is argued that any evaluation must be measurable and documentation thereof must be made accessible for any user of the tool. The documentation must be of a kind that it enables the user to compare different tools offering the same service, hence the descriptions must contain measurable values. A Gold Standard presents a vital part of any measurable evaluation process, therefore, the corpus-based design of a Gold Standard, its creation and problems that occur are reported upon here. Our project concentrates on SMOR, a morphological analyser for German that is to be offered as a web-service. We not only utilize this analyser for designing the Gold Standard, but also evaluate the tool itself at the same time. Note that the project is ongoing, therefore, we cannot present final results.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Morphologie; Deutsch; Korpus; Sprachanalyse; Web Services
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  19. From to ISOTiger – community driven developments for syntax annotation in SynAF
    Erschienen: 2023
    Verlag:  Tübingen : Universität Tübingen ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    In 2010, ISO published a standard for syntactic annotation, ISO 24615:2010 (SynAF). Back then, the document specified a comprehensive reference model for the representation of syntactic annotations, but no accompanying XML serialisation. ISO’s... mehr

     

    In 2010, ISO published a standard for syntactic annotation, ISO 24615:2010 (SynAF). Back then, the document specified a comprehensive reference model for the representation of syntactic annotations, but no accompanying XML serialisation. ISO’s subcommittee on language resource management (ISO TC 37/SC 4) is working on making the SynAF serialisation ISOTiger an additional part of the standard. This contribution addresses the current state of development of ISOTiger, along with a number of open issues on which we are seeking community feedback in order to ensure that ISOTiger becomes a useful extension to the SynAF reference model.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Syntax; Annotation; Standardisierung; Texttechnologie
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  20. Shallow context analysis for German idiom detection
    Erschienen: 2023
    Verlag:  Genf : Zenodo ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    In order to differentiate between figurative and literal usage of verb-noun combinations for the shared task on the disambiguation of German Verbal Idioms issued for KONVENS 2021, we apply and extend an approach originally developed for detecting... mehr

     

    In order to differentiate between figurative and literal usage of verb-noun combinations for the shared task on the disambiguation of German Verbal Idioms issued for KONVENS 2021, we apply and extend an approach originally developed for detecting idioms in a dataset consisting of random ngram samples. The classification is done by implementing a rather shallow, statistics-based pipeline without intensive preprocessing and examinations on the morphosyntactic and semantic level. We describe the overall approach, the differences between the original dataset and the dataset of the KONVENS task, provide experimental classification results, and analyse the individual contributions of our feature sets.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Kontextanalyse; Deutsch; Phraseologie; Datensatz; Automatische Sprachanalyse; Computerlinguistik
    Lizenz:

    creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess

  21. Decomposing necessity — The Hausa exclusive particle sai as a window into the building blocks of modal meaning
    Erschienen: 2023
    Verlag:  London : University College London and Queen Mary University of London ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    We discuss the modal uses of the Hausa exclusive particle sai (≈ only). We argue that the distribution of sai in modal environments provides evidence for the following claims on the composition of modal meaning that have been independently made in... mehr

     

    We discuss the modal uses of the Hausa exclusive particle sai (≈ only). We argue that the distribution of sai in modal environments provides evidence for the following claims on the composition of modal meaning that have been independently made in the literature: i) Future-oriented modality involves a prospective aspect operator that can be realized covertly in some languages (e.g. English, Kratzer 2012b) and overtly in others (e.g. Gitksan, Matthewson 2012, 2013). ii) Necessity interpretations arise from exhaustifying possibilities, i.e. an exhaustivity operator applying to existential modality (e.g. Kaufmann 2012 for the case of imperatives and Leffel 2012 for a relevant analysis of necessity meaning in Masalit). We show that future-oriented necessity in Hausa decomposes into EXH((PROSP)), with sai contributing exhaustivity.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Hausa-Sprache; Partikel; Bedeutung; Modalpartikel; Modalität; Aspekt; Konditional
    Lizenz:

    creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess

  22. Charting a landscape of loans. An e-lexicographical project on German lexical borrowings in Polish dialects
    Erschienen: 2023
    Verlag:  Alexandroupolis : Democritus University of Thrace ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    This paper reports on an ongoing international project of compiling a freely accessible online Dictionary of German Loans in Polish Dialects. The dictionary will be the first comprehensive lexicographic compendium of its kind, serving as a complement... mehr

     

    This paper reports on an ongoing international project of compiling a freely accessible online Dictionary of German Loans in Polish Dialects. The dictionary will be the first comprehensive lexicographic compendium of its kind, serving as a complement to existing resources on German lexical loans in the literary or standard language. The empirical results obtained in the project will shed new light on the distribution of German loanwords among different dialects, also in comparison to the well-documented situation in written Polish. The dictionary will have a strong focus on the dialectal distribution of Polish dialectal variants for a given German etymon, accessible through interactive cartographic representations and corresponding search options. The editorial process is realized with dedicated collaborative web tools. The new resource will be published as an integrated part of an online information system for German lexical borrowings in other languages, the Lehnwortportal Deutsch, and is therefore highly cross-linked with other loanword dictionaries on Polish as well as Slavic and further European languages.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Lehnwort; Lexikografie; Deutsch; Polnisch; Entlehnung; Mundart; Online-Wörterbuch; Wörterbuch; Dialektologie; XML; Datenbank
    Lizenz:

    creativecommons.org/licenses/by-sa/4.0/ ; info:eu-repo/semantics/openAccess

  23. Increasing CMDI’s semantic interoperability with schema.org
    Erschienen: 2023
    Verlag:  Luxemburg : European Language Resources Association ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    The CLARIN Concept Registry (CCR) is the common semantic ground for most CMDI-based profiles to describe language-related resources in the CLARIN universe. While the CCR supports semantic interoperability within this universe, it does not extend... mehr

     

    The CLARIN Concept Registry (CCR) is the common semantic ground for most CMDI-based profiles to describe language-related resources in the CLARIN universe. While the CCR supports semantic interoperability within this universe, it does not extend beyond it. The flexibility of CMDI, however, allows users to use other term or concept registries when defining their metadata components. In this paper, we describe our use of schema.org, a light ontology used by many parties across disciplines.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Semantik; Interoperabilität; Metadaten; Wortschatz; Forschung; Texttechnologie
    Lizenz:

    creativecommons.org/licenses/by-nc/4.0/deed.de ; info:eu-repo/semantics/openAccess

  24. Polish żeby under negation
    Erschienen: 2023
    Verlag:  Berlin [u.a.] : Peter Lang ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS) [Zweitveröffentlichung]

    The paper addresses two patterns in the distribution of complement clauses headed by the complementizer żeby in Polish related to the presence of sentential negation. It is argued that żeby-clauses with an obligatory negation in the matrix clause,... mehr

     

    The paper addresses two patterns in the distribution of complement clauses headed by the complementizer żeby in Polish related to the presence of sentential negation. It is argued that żeby-clauses with an obligatory negation in the matrix clause, licensed by epistemic verbs, can be treated in terms of negative polarity, with żeby defined as an n-word. Structures with żeby-clauses and an obligatory negation in the embedded clause, licensed by verbs of fear, are argued to be an instance of negative complementation, with żeby specified as a negative complementizer. A uniform lexicalist analysis within the framework of HPSG is provided, employing tools developed to account for Negative Concord in Polish.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Polnisch; Negation; Objektsatz; Ergänzung; Komplementierer; Negativer Polaritätsausdruck; Head-driven phrase structure grammar
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

  25. Latgalīšu literaruo volūda izgleiteibā ; The Latgalian Language in Education
    Erschienen: 2023
    Verlag:  Rezekne : Rezekne Academy of Technologies ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

    2008. godā tyka veikts pietejums, kura golvonais mierkis beja raksturuot niulenejū latgalīšu volūdys lūmu izgleiteibys sistemā. Itys roksts prezeņtej byutiskuokūs pietejuma rezultatus. Pietejuma īrūsme sajimta nu „Mercator Education Centre“... mehr

     

    2008. godā tyka veikts pietejums, kura golvonais mierkis beja raksturuot niulenejū latgalīšu volūdys lūmu izgleiteibys sistemā. Itys roksts prezeņtej byutiskuokūs pietejuma rezultatus. Pietejuma īrūsme sajimta nu „Mercator Education Centre“ (Merkatora izgleiteibys centra), kas dorbojās Nīderlaņdē Ļuvortā (frīzu volūdā — Ljouwert), Frīzejis proviņcis golvyspiļsātā. Piļneigs pietejuma izvārsums ar Merkatora izgleiteibys centra atbolstu publicāts izdavumu serejā „Regional Dossier Series“ (Regionalūs dosje sereja) angļu volūdā. Itys roksts golvonom kuortom dūmuots taidam adresatam, kas mozuok ir saisteits ar Eiropys volūdu izpietis institucejom i kam roksti angļu volūdā var saguoduot izpratnis voi atrasšonys gryuteibys. Partū pietejuma suokumā teik dūts seikuoks metožu i mierķu raksturuojums, paskaidrojūt pietejuma strukturu i rezultatu apkūpuojuma veidu, kai ari dūts puorskots par latgalīšu volūdys lūmu myusdīnu izgleiteibys sistemā. Sacynuojumūs ir īzeimātys nuokūtnis perspektivis i prīšklykumi dabuotūs rezultatu izmontuojumam. ; Our paper presents the most important results of a research carried out throughout 2008 in cooperation with the Mercator Education Centre, the Netherlands. A detailed account of our research has been published in English in the “Regional Dossier Series” of the Mercator Centre. We provide an overview of where the Latgalian language is today present in education, and conclude with an outlook into the future and some suggestions for the use of our results. In our paper we stress that both educational efforts, research and motivation campaigns are needed — for which educational staff, researchers, language policy makers, local authorities, and active organizations and individuals have to cooperate. As a major aim, Latgalian would strongly need support through additional presence in the prestigious domains of language use in society. This is in particular important since some of our informants stressed that they have rather negative expectations for the future — according to some Latgalian ...

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt Germanistik
    Sprache: Lettisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Sprache (400)
    Schlagworte: Lettisch; Mundart; Bildung; Minderheitensprache; Sprachpolitik
    Lizenz:

    rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess