Suchergebnisse

A machine learning approach to pronoun resolution in spoken dialogue

Autor*in: Strube, Michael ; Müller, Mark-Christoph

Erschienen: 2022

Verlag: Stroudsburg, Pennsylvania : Association for Computational Linguistics ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

We apply a decision tree based approach to pronoun resolution in spoken dialogue. Our system deals with pronouns with NP- and non-NP-antecedents. We present a set of features designed for pronoun resolution in spoken dialogue and determine the most... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11156 https://ids-pub.bsz-bw.de/files/11156/Strube_Mueller_A_machine_learning_approach_2022.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-111560 https://doi.org/10.3115/1075096.1075118

We apply a decision tree based approach to pronoun resolution in spoken dialogue. Our system deals with pronouns with NP- and non-NP-antecedents. We present a set of features designed for pronoun resolution in spoken dialogue and determine the most promising features. We evaluate the system on twenty Switchboard dialogues and show that it compares well to Byron’s (2002) manually tuned system.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Maschinelles Lernen; Pronomen; Dialog; Gesprochene Sprache; Entscheidungsbaum; Korpus; Nominalphrase
Lizenz:	creativecommons.org/licenses/by-nc-sa/3.0/ ; info:eu-repo/semantics/openAccess

Multi-level annotation in MMAX

Autor*in: Müller, Mark-Christoph ; Strube, Michael

Erschienen: 2022

Verlag: Stroudsburg, Pennsylvania : Association for Computational Linguistics ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

We present a light-weight tool for the annotation of linguistic data on multiple levels. It is based on the simplification of annotations to sets of markables having attributes and standing in certain relations to each other. We describe the main... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11159 https://ids-pub.bsz-bw.de/files/11159/Mueller_Strube_Multi_level_annotation_2003.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-111596

We present a light-weight tool for the annotation of linguistic data on multiple levels. It is based on the simplification of annotations to sets of markables having attributes and standing in certain relations to each other. We describe the main features of the tool, emphasizing its simplicity, customizability and versatility

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Computerlinguistik; Daten; Korpus; Sprachdaten; Annotation
Lizenz:	creativecommons.org/licenses/by-nc-sa/3.0/ ; info:eu-repo/semantics/openAccess

An API for discourse-level access to XML-encoded corpora

Autor*in: Müller, Mark-Christoph ; Strube, Michael

Erschienen: 2022

Verlag: Paris : European Language Resources Association (ELRA) ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

We describe a simple and efficient Java object model and application programming interface (API) for (possibly multi-modal) annotated natural language corpora. Corpora are represented as elements like Sentences, Turns, Utterances, Words, Gestures and... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11160 https://ids-pub.bsz-bw.de/files/11160/Mueller_Strube_An_API_for_discourse_level_access_2002.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-111602

We describe a simple and efficient Java object model and application programming interface (API) for (possibly multi-modal) annotated natural language corpora. Corpora are represented as elements like Sentences, Turns, Utterances, Words, Gestures and Markables. The API allows linguists to access corpora in terms of these discourse-level elements, i.e. at a conceptual level they are familiar with, with the flexibility offered by a general purpose programming language. It is also a contribution to corpus standardization efforts because it is based on a straightforward and easily extensible data model which can serve as a target for conversion of different corpus formats.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	API; XML; Korpus; Natürliche Sprache; Vereinheitlichung; Datenmodell; Softwarewiederverwendung
Lizenz:	creativecommons.org/licenses/by-nc-sa/3.0/ ; info:eu-repo/semantics/openAccess

Applying co-training to reference resolution

Autor*in: Müller, Mark-Christoph ; Rapp, Stefan ; Strube, Michael

Erschienen: 2022

Verlag: Stroudsburg, Pennsylvania : Association for Computational Linguistics ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

In this paper, we investigate the practical applicability of Co-Training for the task of building a classifier for reference resolution. We are concerned with the question if Co-Training can significantly reduce the amount of manual labeling work and... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11164 https://ids-pub.bsz-bw.de/files/11164/Mueller_Rapp_Strube_Applying_co_training_2002.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-111649 https://doi.org/10.3115/1073083.1073142

In this paper, we investigate the practical applicability of Co-Training for the task of building a classifier for reference resolution. We are concerned with the question if Co-Training can significantly reduce the amount of manual labeling work and still produce a classifier with an acceptable performance.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Computerlinguistik; Korpus
Lizenz:	creativecommons.org/licenses/by-nc-sa/3.0/ ; info:eu-repo/semantics/openAccess

Annotating anaphoric and bridging relations with MMAX

Autor*in: Müller, Mark-Christoph ; Strube, Michael

Erschienen: 2022

Verlag: Stroudsburg, Pennsylvania : Association for Computational Linguistics ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

We present a tool for the annotation of anaphoric and bridging relations in a corpus of written texts. Based on differences as well as similarities between these phenomena, we define an annotation scheme. We then implement the scheme within an... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11165 https://ids-pub.bsz-bw.de/files/11165/Mueller_Annotating_anaphoric_and_bridging_relations_2001.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-111657 https://doi.org/10.3115/1118078.1118090

We present a tool for the annotation of anaphoric and bridging relations in a corpus of written texts. Based on differences as well as similarities between these phenomena, we define an annotation scheme. We then implement the scheme within an annotation tool and demonstrate its use.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Annotation; Anapher <Syntax>; Korpus; Computerlinguistik; Schriftsprache; Datenmodell; XML
Lizenz:	creativecommons.org/licenses/by-nc-sa/3.0/ ; info:eu-repo/semantics/openAccess

Improving extractive dialogue summarization by utilizing human feedback

Autor*in: Mieskes, Margot ; Müller, Christoph ; Strube, Michael

Erschienen: 2022

Verlag: Anaheim [u.a.] : ACTA Press ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS) [Zweitveröffentlichung]

Automatic summarization systems usually are trained and evaluated in a particular domain with fixed data sets. When such a system is to be applied to slightly different input, labor- and cost-intensive annotations have to be created to retrain the... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11412 https://ids-pub.bsz-bw.de/files/11412/Mieskes_Mueller_Improving_extractive_dialogue_2007.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-114120

Automatic summarization systems usually are trained and evaluated in a particular domain with fixed data sets. When such a system is to be applied to slightly different input, labor- and cost-intensive annotations have to be created to retrain the system. We deal with this problem by providing users with a GUI which allows them to correct automatically produced imperfect summaries. The corrected summary in turn is added to the pool of training data. The performance of the system is expected to improve as it adapts to the new domain.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Zusammenfassung; Dialog; Annotation; Graphische Benutzeroberfläche; Maschinelles Lernen; Computerlinguistik; Digital Humanities
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Reconstructing manual information extraction with DB-to-document backprojection: Experiments in the life science domain

Autor*in: Müller, Mark-Christoph ; Ghosh, Sucheta ; Rey, Maja ; Wittig, Ulrike ; Müller, Wolfgang ; Strube, Michael

Erschienen: 2022

Verlag: Stroudsburg, Pennsylvania : Association for Computational Linguistics ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

We introduce a novel scientific document processing task for making previously inaccessible information in printed paper documents available to automatic processing. We describe our data set of scanned documents and data records from the biological... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11085 https://ids-pub.bsz-bw.de/files/11085/Mueller_Reconstructing_manual_information_extraction_2020.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-110854 https://doi.org/10.18653/v1/2020.sdp-1.9

We introduce a novel scientific document processing task for making previously inaccessible information in printed paper documents available to automatic processing. We describe our data set of scanned documents and data records from the biological database SABIO-RK, provide a definition of the task, and report findings from preliminary experiments. Rigorous evaluation proved challenging due to lack of gold-standard data and a difficult notion of correctness. Qualitative inspection of results, however, showed the feasibility and usefulness of the task.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Computerlinguistik; Information Extraction; Schriftstück; Experiment; Datenanalyse; Qualitative Inhaltsanalyse
Lizenz:	creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess

Transparent, efficient, and robust word embedding access with WOMBAT

Autor*in: Müller, Mark-Christoph ; Strube, Michael

Erschienen: 2022

Verlag: Stroudsburg, Pennsylvania : Association for Computational Linguistics ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

We present WOMBAT, a Python tool which supports NLP practitioners in accessing word embeddings from code. WOMBAT addresses common research problems, including unified access, scaling, and robust and reproducible preprocessing. Code that uses WOMBAT... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11086 https://ids-pub.bsz-bw.de/files/11086/Mueller_Strube_Transparent_efficient_and_robust_2018.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-110862

We present WOMBAT, a Python tool which supports NLP practitioners in accessing word embeddings from code. WOMBAT addresses common research problems, including unified access, scaling, and robust and reproducible preprocessing. Code that uses WOMBAT for accessing word embeddings is not only cleaner, more readable, and easier to reuse, but also much more efficient than code using standard in-memory methods: a Python script using WOMBAT for evaluating seven large word embedding collections (8.7M embedding vectors in total) on a simple SemEval sentence similarity task involving 250 raw sentence pairs completes in under ten seconds end-to-end on a standard notebook computer.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Python; Automatische Sprachanalyse; Code; Computerlinguistik
Lizenz:	creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess

Knowledge sources for bridging resolution in multi-party dialog

Autor*in: Müller, Mark-Christoph ; Mieskes, Margot ; Strube, Michael

Erschienen: 2022

Verlag: Paris : European Language Resources Association (ELRA) ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

In this paper we investigate the coverage of the two knowledge sources WordNet and Wikipedia for the task of bridging resolution. We report on an annotation experiment which yielded pairs of bridging anaphors and their antecedents in spoken... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11102 https://ids-pub.bsz-bw.de/files/11102/Mueller_Mieskes_Knowledge_sources_2008.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-111024

In this paper we investigate the coverage of the two knowledge sources WordNet and Wikipedia for the task of bridging resolution. We report on an annotation experiment which yielded pairs of bridging anaphors and their antecedents in spoken multi-party dialog. Manual inspection of the two knowledge sources showed that, with some interesting exceptions, Wikipedia is superior to WordNet when it comes to the coverage of information necessary to resolve the bridging anaphors in our data set. We further describe a simple procedure for the automatic extraction of the required knowledge from Wikipedia by means of an API, and discuss some of the implications of the procedure’s performance.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Dialog; WordNet; Wikipedia; Gesprochene Sprache; Information; Datensatz; Wissensextraktion; API; Diskurs; Semantic Web; Lexikon
Lizenz:	creativecommons.org/licenses/by-nc-sa/3.0/ ; info:eu-repo/semantics/openAccess

Filtern nach

Aktive Filter

Kategorien:

Bereich

Quelle

Format

Beteiligt

Medientyp

Sprache

Jahr

Letzte Suchanfragen

Ergebnisse für *

A machine learning approach to pronoun resolution in spoken dialogue

Multi-level annotation in MMAX

An API for discourse-level access to XML-encoded corpora

Applying co-training to reference resolution

Annotating anaphoric and bridging relations with MMAX

Improving extractive dialogue summarization by utilizing human feedback

Reconstructing manual information extraction with DB-to-document backprojection: Experiments in the life science domain

Transparent, efficient, and robust word embedding access with WOMBAT

Knowledge sources for bridging resolution in multi-party dialog

Kontakt

Partner