Suchergebnisse

Contexts, patterns, interrelations - new ways of presenting multi-word expressions

Autor*in: Steyer, Kathrin ; Brunner, Annelen

Erschienen: 2015

This contribution presents the newest version of our ’Wortverbindungsfelder’ (fields of multi-word expressions), an experimental lexicographic resource that focusses on aspects of MWEs that are rarely addressed in traditional descriptions: Contexts,... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/3948 https://ids-pub.bsz-bw.de/files/3948/Steyer_Brunner_Context_patterns_interrelations_2014.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-39483

This contribution presents the newest version of our ’Wortverbindungsfelder’ (fields of multi-word expressions), an experimental lexicographic resource that focusses on aspects of MWEs that are rarely addressed in traditional descriptions: Contexts, patterns and interrelations. The MWE fields use data from a very large corpus of written German (over 6 billion word forms) and are created in a strictly corpus-based way. In addition to traditional lexicographic descriptions, they include quantitative corpus data which is structured in new ways in order to show the usage specifics. This way of looking at MWEs gives insight in the structure of language and is especially interesting for foreign language learners.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Germanische Sprachen; Deutsch (430)
Schlagworte:	Deutsch; Wortverbindung; Korpus
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Deep learning for free indirect representation

Autor*in: Brunner, Annelen ; Tu, Ngoc Duyen Tanja ; Weimer, Lukas ; Jannidis, Fotis

Erschienen: 2019

Verlag: München [u.a.] : German Society for Computational Linguistics & Language Technology und Friedrich-Alexander-Universität Erlangen-Nürnberg

In this paper, we present our work-inprogress to automatically identify free indirect representation (FI), a type of thought representation used in literary texts. With a deep learning approach using contextual string embeddings, we achieve f1 scores... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/9315 https://ids-pub.bsz-bw.de/files/9315/Brunner_etal._Deep_learning_2019.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-93151

In this paper, we present our work-inprogress to automatically identify free indirect representation (FI), a type of thought representation used in literary texts. With a deep learning approach using contextual string embeddings, we achieve f1 scores between 0.45 and 0.5 (sentence-based evaluation for the FI category) on two very different German corpora, a clear improvement on earlier attempts for this task. We show how consistently marked direct speech can help in this task. In our evaluation, we also consider human inter-annotator scores and thus address measures of certainty for this difficult phenomenon.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Deutsch; Indirekte Rede; Erlebte Rede; Automatische Sprachanalyse; Korpus
Lizenz:	creativecommons.org/licenses/by-nc-sa/4.0/deed.de ; info:eu-repo/semantics/openAccess

An XML Annotation Schema for speech, thought and writing representation

Autor*in: Brunner, Annelen

Erschienen: 2015

This contribution presents an XML Schema for annotating a high level narratological category: speech, thought and writing representation (ST&WR). It focusses on two aspects: Firstly, the original Schema is presented as an example for the challenge to... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/3949 https://ids-pub.bsz-bw.de/files/3949/Brunner_An_XML_annotation_schema_2014.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-39493

This contribution presents an XML Schema for annotating a high level narratological category: speech, thought and writing representation (ST&WR). It focusses on two aspects: Firstly, the original Schema is presented as an example for the challenge to encode a narrative feature in a structured and flexible way and secondly, ways of adapting this Schema to TEI are considered, in Order to make it usable for other, TEI-based projects.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Linguistik (410)
Schlagworte:	Automatische Sprachanalyse; Annotation; Prosa; Redeerwähnung; Direkte Rede
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Corpus-driven study of multi-word expressions based on collocations from a very large corpus

Autor*in: Brunner, Annelen ; Steyer, Kathrin

Erschienen: 2015

Verlag: Birmingham : University of Birmingham

We present a corpus-driven approach to the study of multi-word expressions, which constitute a significant part of. As a data basis, we use collocation profiles computed from DeReKo (Deutsches Referenzkorpus), the largest available collection of... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/4141 https://ids-pub.bsz-bw.de/files/4141/Brunner_Steyer_Corpus-driven_study_of_multi-word_expressions_based_on_collocations_from_a_very_large_corpus_2007.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-41414

We present a corpus-driven approach to the study of multi-word expressions, which constitute a significant part of. As a data basis, we use collocation profiles computed from DeReKo (Deutsches Referenzkorpus), the largest available collection of written German which has approximately two billion word tokens and is located at the Institute for the German Language (IDS). We employ a strongly usage-based approach to multi-word expressions, which we think of as conventionalised patterns in language use that manifest themselves in recurrent syntagmatic patterns of words. They are defined by their distinct function in language. To find multi-word expressions, we allow ourselves to be guided by corpus data and statistical evidence as much as possible, making interpretative steps carefully and in a monitored fashion. We develop a procedure of interpretation that leads us from the evidence of collocation profiles to a collection of recurrent word patterns and finally to multi-word expressions. When building up a collection of multi-word expressions in this fashion, it becomes clear that the expressions can be defined on different levels of generalisation and are interrelated in various ways. This will be reflected in the documentation and presentation of the findings. We are planning to add annotation in a way that allows grouping the multi-word expressions according to different features and to add links between them to reflect their relationships, thus constructing a network of multi-word expressions.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Linguistik (410)
Schlagworte:	Deutsch; Kollokation; Korpus; Sprachstatistik
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Redewiedergabe in Heftromanen und Hochliteratur

Autor*in: Brunner, Annelen ; Jannidis, Fotis ; Tu, Ngoc Duyen Tanja ; Weimer, Lukas

Erschienen: 2023

Verlag: Paderborn : Zenodo ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

Die vorgestellte Studie untersucht die Anteile unterschiedlicher Redewiedergabeformen im Vergleich zwischen zwei Literaturtypen von gegensätzlichen Enden des Spektrums: Hochliteratur – definiert als Werke, die auf der Auswahlliste von... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11557 https://ids-pub.bsz-bw.de/files/11557/Brunner_Fotis_Tu_Redewiedergabe_2020.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-115571 https://doi.org/10.5281/zenodo.4621814

Die vorgestellte Studie untersucht die Anteile unterschiedlicher Redewiedergabeformen im Vergleich zwischen zwei Literaturtypen von gegensätzlichen Enden des Spektrums: Hochliteratur – definiert als Werke, die auf der Auswahlliste von Literaturpreisen standen – und Heftromanen, massenproduzierten Erzählwerken, die zumeist über den Zeitschriftenhandel vertrieben werden und früher abwertend als „Romane der Unterschicht” (Nusser 1981) bezeichnet wurden. Unsere These ist, dass sich diese Literaturtypen hinsichtlich ihrer Erzählweise unterscheiden, und sich dies in den verwendeten Wiedergabeformen niederschlägt. Der Fokus der Untersuchung liegt auf der Dichotomie zwischen direkter und nicht-direkter Wiedergabe, die schon in der klassischen Rhetorik aufgemacht wurde.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Deutsch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Hochliteratur; Romanheft; Erzähltechnik; Annotation; Volltext
Lizenz:	creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess

To BERT or not to BERT – Comparing contextual embeddings in a deep learning architecture for the automatic recognition of four types of speech, thought and writing representation

Autor*in: Brunner, Annelen ; Tu, Ngoc Duyen Tanja ; Weimer, Lukas ; Jannidis, Fotis

Erschienen: 2023

Verlag: Aachen : CEUR-WS ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

We present recognizers for four very different types of speech, thought and writing representation (STWR) for German texts. The implementation is based on deep learning with two different customized contextual embeddings, namely FLAIR embeddings and... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11561 https://ids-pub.bsz-bw.de/files/11561/Brunner_Tu_Comparing_contextual_embeddings_2020.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-115617

We present recognizers for four very different types of speech, thought and writing representation (STWR) for German texts. The implementation is based on deep learning with two different customized contextual embeddings, namely FLAIR embeddings and BERT embeddings. This paper gives an evaluation of our recognizers with a particular focus on the differences in performance we observed between those two embeddings. FLAIR performed best for direct STWR (F1=0.85), BERT for indirect (F1=0.76) and free indirect (F1=0.59) STWR. For reported STWR, the comparison was inconclusive, but BERT gave the best average results and best individual model (F1=0.60). Our best recognizers, our customized language embeddings and most of our test and training data are freely available and can be found via www.redewiedergabe.de or at github.com/redewiedergabe.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Einbettung; Deutsch; Testdaten; Textanalyse
Lizenz:	creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess

Projektvorstellung – Redewiedergabe. Eine literatur- und sprachwissenschaftliche Korpusanalyse

Autor*in: Brunner, Annelen ; Engelberg, Stefan ; Jannidis, Fotis ; Tu, Ngoc Duyen Tanja ; Weimer, Lukas

Erschienen: 2023

Verlag: Potsdam : Zenodo ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

Das laufende DFG-Projekt „Redewiedergabe“ stellt einen Anwendungsfall quantitativer Sprach-und Literaturwissenschaft dar und beschäftigt sich mit dem Phänomen „Redewiedergabe“ auf der Grundlage großer Datenmengen. Zu diesem Zweck wird zum einen ein... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/12063 https://ids-pub.bsz-bw.de/files/12063/Brunner_Engelberg_Jannidis_Projektvorstellung_2018.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-120638 https://doi.org/10.5281/zenodo.3684897

Das laufende DFG-Projekt „Redewiedergabe“ stellt einen Anwendungsfall quantitativer Sprach-und Literaturwissenschaft dar und beschäftigt sich mit dem Phänomen „Redewiedergabe“ auf der Grundlage großer Datenmengen. Zu diesem Zweck wird zum einen ein Korpus manuell mit Redewiedergabeformen annotiert, zum anderen werden Verfahren zur automatischen Erkennung des Phänomens entwickelt. Ziel ist es, Forschungsfragen nach der Entwicklung von Redewiedergabe vor allem im 19. Jahrhundert zu beantworten.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Deutsch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Literaturwissenschaft; Datenanalyse; Korpus; Erzähltechnik; Indirekte Rede
Lizenz:	creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess

KoMuX - Der Kompositamuster-Explorer

Autor*in: Brunner, Annelen ; Hein, Katrin

Erschienen: 2023

Verlag: Potsdam : Zenodo ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

KoMuX, der Kompositamuster-Explorer, (www.owid.de/plus/komux) ist eine Webanwendung, die es ermöglicht, mehr als 50.000 nominale Komposita des Deutschen gezielt nach abstrakten oder lexikalisch-teilspezifizierten Mustern zu durchsuchen.... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/12180 https://ids-pub.bsz-bw.de/files/12180/Brunner_Hein_KoMuX_2023.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-121809 https://doi.org/10.5281/zenodo.7688632

KoMuX, der Kompositamuster-Explorer, (www.owid.de/plus/komux) ist eine Webanwendung, die es ermöglicht, mehr als 50.000 nominale Komposita des Deutschen gezielt nach abstrakten oder lexikalisch-teilspezifizierten Mustern zu durchsuchen. Unterschiedliche Visualisierungen helfen dabei, Strukturen und Zusammenhänge innerhalb der Ergebnismenge zu erfassen.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Deutsch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Komposition <Wortbildung>; Deutsch; Anwendungssystem; Open Science; Datenerhebung
Lizenz:	creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess

Automatic recognition of direct speech without quotation marks. A rule-based approach

Autor*in: Tu, Ngoc Duyen Tanja ; Krug, Markus ; Brunner, Annelen

Erschienen: 2019

Verlag: Frankfurt am Main : Zenodo

This paper describes a rule-based approach to detect direct speech without the help of any quotation markers. As datasets fictional and non-fictional texts were used. Our evaluation shows that the results appear stable throughout different datasets... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/8770 https://ids-pub.bsz-bw.de/files/8770/Tu_Krug_Brunner_Automatic_recognition_of_direct_speech_2019.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-87708 https://doi.org/10.5281/zenodo.2600812

This paper describes a rule-based approach to detect direct speech without the help of any quotation markers. As datasets fictional and non-fictional texts were used. Our evaluation shows that the results appear stable throughout different datasets in the fictional domain and are comparable to the results achieved in related work.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Computerlinguistik; Direkte Rede; Text Mining; Natürliche Sprache; Algorithmus
Lizenz:	creativecommons.org/licenses/by-sa/4.0/ ; info:eu-repo/semantics/openAccess

Das Redewiedergabe-Korpus. Eine neue Ressource

Autor*in: Brunner, Annelen ; Weimer, Lukas ; Tu, Ngoc Duyen Tanja ; Engelberg, Stefan ; Jannidis, Fotis

Erschienen: 2019

Verlag: Frankfurt am Main : Zenodo

In diesem Beitrag wird das Redewiedergabe-Korpus (RW-Korpus) vorgestellt, ein historisches Korpus fiktionaler und nicht-fiktionaler Texte, das eine detaillierte manuelle Annotation mit Redewiedergabeformen enthält. Das Korpus entsteht im Rahmen eines... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/8771 https://ids-pub.bsz-bw.de/files/8771/Brunner_Weimer_Tu_Engelberg_Jannidis_Das_Redewiedergabe_Korpus_2019.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-87710 https://doi.org/10.5281/zenodo.2600812

In diesem Beitrag wird das Redewiedergabe-Korpus (RW-Korpus) vorgestellt, ein historisches Korpus fiktionaler und nicht-fiktionaler Texte, das eine detaillierte manuelle Annotation mit Redewiedergabeformen enthält. Das Korpus entsteht im Rahmen eines laufenden DFG-Projekts und ist noch nicht endgültig abgeschlossen, jedoch ist für Frühjahr 2019 ein Beta-Release geplant, welches der Forschungsgemeinschaft zur Verfügung gestellt wird. Das endgültige Release soll im Frühjahr 2020 erfolgen. Das RW-Korpus stellt eine neuartige Ressource für die Redewiedergabe-Forschung dar, die in dieser Detailliertheit für das Deutsche bisher nicht verfügbar ist, und kann sowohl für quantitative linguistische und literaturwissenschaftliche Untersuchungen als auch als Trainingsmaterial für maschinelles Lernen dienen.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Deutsch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Korpus; Redewiedergabe; Annotation; Automatische Spracherkennung; Deutsch
Lizenz:	creativecommons.org/licenses/by-sa/4.0/ ; info:eu-repo/semantics/openAccess

Filtern nach

Aktive Filter

Kategorien:

Bereich

Quelle

Format

Beteiligt

Medientyp

Sprache

Jahr

Letzte Suchanfragen

Ergebnisse für *

Contexts, patterns, interrelations - new ways of presenting multi-word expressions

Deep learning for free indirect representation

An XML Annotation Schema for speech, thought and writing representation

Corpus-driven study of multi-word expressions based on collocations from a very large corpus

Redewiedergabe in Heftromanen und Hochliteratur

To BERT or not to BERT – Comparing contextual embeddings in a deep learning architecture for the automatic recognition of four types of speech, thought and writing representation

Projektvorstellung – Redewiedergabe. Eine literatur- und sprachwissenschaftliche Korpusanalyse

KoMuX - Der Kompositamuster-Explorer

Automatic recognition of direct speech without quotation marks. A rule-based approach

Das Redewiedergabe-Korpus. Eine neue Ressource

Kontakt

Partner