Suchergebnisse

Adding Value to CMC Corpora: CLARINification and Part-of-speech Annotation of the Dortmund Chat Corpus

Autor*in: Beißwenger, Michael ; Ehrhardt, Eric ; Horbach, Andrea ; Lüngen, Harald ; Steffen, Diana ; Storrer, Angelika

Erschienen: 2015

Verlag: German Society for Computational Linguistics & Language Technology (GSCL)

Bibliographische Angaben
Zugang

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/4365 https://ids-pub.bsz-bw.de/files/4365/Bei%C3%9Fwenger_Ehrhardt_Horbach_L%C3%BCngen_Steffen_Storrer_Adding_value_to_CMC_corpora_2015.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-43654

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Linguistik (410)
Schlagworte:	Korpus; Computerunterstützte Kommunikation; Deutsch; Elektronisches Forum; Annotation
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

The Morphosyntactic Annotation of DeReKo: Interpretation, Opportunities, and Pitfalls

Autor*in: Belica, Cyril ; Kupietz, Marc ; Witt, Andreas ; Lüngen, Harald

Erschienen: 2015

Verlag: Tübingen : Narr

The paper discusses from various angles the morphosyntactic annotation of DeReKo, the Archive of General Reference Corpora of Contemporary Written German at the Institut für Deutsche Sprache (IDS), Mannheim. The paper is divided into two parts. The... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/4489 https://ids-pub.bsz-bw.de/files/4489/Belica_Kupietz_Witt_Luengen_The_Morphosyntactic_Annotation_of_DeReKo_2011.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-44890

The paper discusses from various angles the morphosyntactic annotation of DeReKo, the Archive of General Reference Corpora of Contemporary Written German at the Institut für Deutsche Sprache (IDS), Mannheim. The paper is divided into two parts. The first part covers the practical and technical aspects of this endeavor. We present results from a recent evaluation of tools for the annotation of German text resources that have been applied to DeReKo. These tools include commercial products, especially Xerox' Finite State Tools and the Machinese products developed by the Finnish company Connexor Oy, as well as software for which academic licenses are available free of charge for academic institutions, e.g. Helmut Schmid's Tree Tagger. The second part focuses on the linguistic interpretability of the corpus annotations and more general methodological considerations concerning scientifically sound empirical linguistic research. The main challenge here is that unlike the texts themselves, the morphosyntactic annotations of DeReKo do not have the status of observed data; instead they constitute a theory and implementation-dependent interpretation. In addition, because of the enormous size of DeReKo, a systematic manual verification of the automatic annotations is not feasible. In consequence, the expected degree of inaccuracy is very high, particularly wherever linguistically challenging phenomena, such as lexical or grammatical variation, are concerned. Given these facts, a researcher using the annotations blindly will run the risk of not actually studying the language but rather the annotation tool or the theory behind it. The paper gives an overview of possible pitfalls and ways to circumvent them and discusses the opportunities offered by using annotations in corpus-based and corpus-driven grammatical research against the background of a scientifically sound methodology.

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Aufsatz aus einem Sammelband
Format:	Online
DDC Klassifikation:	Linguistik (410)
Schlagworte:	Korpus; Annotation; Schriftsprache
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

SusTEInability of linguistic resources through feature structures

Autor*in: Witt, Andreas ; Rehm, Georg ; Hinrichs, Erhard ; Lehmberg, Timm ; Stegmann, Jens

Erschienen: 2015

Verlag: Oxford : Oxford University Press

This article shows that the TEI tag set for feature structures can be adopted to represent a heterogeneous set of linguistic corpora. The majority of corpora is annotated using markup languages that are based on the Annotation Graph framework, the... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/4490 https://ids-pub.bsz-bw.de/files/4490/Witt_Rehm_Hinrichs_SusTEInability_of_Linguisitc_Resources_2009-1.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-44901

This article shows that the TEI tag set for feature structures can be adopted to represent a heterogeneous set of linguistic corpora. The majority of corpora is annotated using markup languages that are based on the Annotation Graph framework, the upcoming Linguistic Annotation Format ISO standard, or according to tag sets defined by or based upon the TEI guidelines. A unified representation comprises the separation of conceptually different annotation layers contained in the original corpus data (e.g. syntax, phonology, and semantics) into multiple XML files. These annotation layers are linked to each other implicitly by the identical textual content of all files. A suitable data structure for the representation of these annotations is a multi-rooted tree that again can be represented by the TEI and ISO tag set for feature structures. The mapping process and representational issues are discussed as well as the advantages and drawbacks associated with the use of the TEI tag set for feature structures as a storage and exchange format for linguistically annotated data.

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Aufsatz aus einer Zeitschrift
Format:	Online
DDC Klassifikation:	Linguistik (410)
Schlagworte:	Programmiersprache; Annotation; Text Encoding Initiative (TEI)
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Guidance through the standards jungle for linguistic resources

Autor*in: Stührenberg, Maik ; Werthmann, Antonina ; Witt, Andreas

Erschienen: 2015

Verlag: Paris : European Language Resources Association (ELRA)

Research today is often performed in collaborated projects composed of project partners with different backgrounds and from different institutions and countries. Standards can be a crucial tool to help harmonizing these differences and to create... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/4494 https://ids-pub.bsz-bw.de/files/4494/Stuehrenberg_Werthmann_Witt_Guidance_through_the_standards_jungle_2012.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-44943

Research today is often performed in collaborated projects composed of project partners with different backgrounds and from different institutions and countries. Standards can be a crucial tool to help harmonizing these differences and to create sustainable resources. However, choosing a standard depends on having enough information to evaluate and compare different annotation and metadata formats. In this paper we present ongoing work on an interactive, collaborative website that collects information on standards in the ﬁeld of linguistics as a means to guide interested researchers.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Linguistik (410)
Schlagworte:	Computerlinguistik; Annotation
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Different Views on Markup

Autor*in: Goecke, Daniela ; Lüngen, Harald ; Metzing, Dieter ; Stührenberg, Maik ; Witt, Andreas

Erschienen: 2015

Verlag: Dordrecht : Springer

In this chapter, two different ways of grouping information represented in document markup are examined: annotation levels, referring to conceptual levels of description, and annotation layers, referring to the technical realisation of markup using... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/4496 https://ids-pub.bsz-bw.de/files/4496/Goecke_Luengen_Metzing_Different_Views_on_Markup_2010.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-44966 https://doi.org/10.1007/978-90-481-3331-4_1

In this chapter, two different ways of grouping information represented in document markup are examined: annotation levels, referring to conceptual levels of description, and annotation layers, referring to the technical realisation of markup using e.g. document grammars. In many current XML annotation projects, multiple levels are integrated into one layer, often leading to the problem of having to deal with overlapping hierarchies. As a solution, we propose a framework for XML-based multiple, independent XML annotation layers for one text, based on an abstract representation of XML documents with logical predicates. Two realisations of the abstract representation are presented, a Prolog fact base format together with an application architecture, and a specification for XML native databases. We conclude with a discussion of projects that have currently adopted this framework.

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Aufsatz aus einem Sammelband
Format:	Online
DDC Klassifikation:	Linguistik (410)
Schlagworte:	Computerlinguistik; XML; Annotation
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Integrated Linguistic Annotation Models and Their Application in the Domain of Antecedent Detection

Autor*in: Witt, Andreas ; Stührenberg, Maik ; Goecke, Daniela ; Metzing, Dieter

Erschienen: 2015

Verlag: Berlin/Heidelberg : Springer

Seamless integration of various, often heterogeneous linguistic resources in terms of their output formats and a combined analysis of the respective annotation layers are crucial tasks for linguistic research. After a decade of concentration on the... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/4507 https://ids-pub.bsz-bw.de/files/4507/Witt_Stuehrenberg_Goecke_Integrated_Linguistic_Annotation_Models_and_2011.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-45077

Seamless integration of various, often heterogeneous linguistic resources in terms of their output formats and a combined analysis of the respective annotation layers are crucial tasks for linguistic research. After a decade of concentration on the development of formats to structure single annotations for specific linguistic issues, in the last years a variety of specifications to store multiple annotations over the same primary data has been developed. The paper focuses on the integration of the knowledge resource logical document structure information into a text document to enhance the task of automatic anaphora resolution both for the task of candidate detection and antecedent selection. The paper investigates data structures necessary for knowledge integration and retrieval.

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Aufsatz aus einem Sammelband
Format:	Online
DDC Klassifikation:	Linguistik (410)
Schlagworte:	Computerlinguistik; Annotation; Automatische Sprachanalyse
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Multi-Dimensional Markup: N-way relations as a generalisation over possible relations between annotation layers

Autor*in: Lüngen, Harald ; Witt, Andreas

Erschienen: 2015

Verlag: Oulu : University of Oulu

Bibliographische Angaben
Zugang

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/4510 https://ids-pub.bsz-bw.de/files/4510/Luengen_Witt_Multi_Dimensional_Markup_N_way_relations_as_a_generalisation_over_possible_2008.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-45104

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Linguistik (410)
Schlagworte:	Digital Humanities; Auszeichnungssprache; Annotation
Lizenz:	creativecommons.org/licenses/by-nc-nd/3.0/de/deed.de ; info:eu-repo/semantics/openAccess

Corpus Masking: Legally Bypassing Licensing Restrictions for the Free Distribution of Text Collections

Autor*in: Rehm, Georg ; Witt, Andreas ; Zinsmeister, Heike ; Dellert, Johannes

Erschienen: 2015

Verlag: Urbana-Champaign : University of Illinois

Bibliographische Angaben
Zugang

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/4514 https://ids-pub.bsz-bw.de/files/4514/Rehm_Witt_Zinsmeister_Corpus_Masking_Legally_Bypassing_Licensing_Restrictions_2007.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-45145

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Linguistik (410)
Schlagworte:	Korpus; Auszeichnungssprache; Annotation
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Multidimensional markup and heterogeneous linguistic resources

Autor*in: Stührenberg, Maik ; Witt, Andreas ; Goecke, Daniela ; Metzing, Dieter ; Schonefeld, Oliver

Erschienen: 2016

Verlag: Stroudsburg : ACL

The paper discusses two topics: firstly an approach of using multiple layers of annotation is sketched out. Regarding the XML representation this approach is similar to standoff annotation. A second topic is the use of heterogeneous linguistic... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/4519 https://ids-pub.bsz-bw.de/files/4519/Stuehrenberg_Witt_Goecke_Multidimensional_markup_and_heterogeneous_linguistic_resources_2006.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-45197

The paper discusses two topics: firstly an approach of using multiple layers of annotation is sketched out. Regarding the XML representation this approach is similar to standoff annotation. A second topic is the use of heterogeneous linguistic resources (e.g., XML annotated documents, taggers, lexical nets) as a source for semiautomatic multi-dimensional markup to resolve typical linguistic issues, dealing with anaphora resolution as a case study.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Linguistik (410)
Schlagworte:	Korpus; Multimodalität; Annotation
Lizenz:	creativecommons.org/licenses/by-nc-nd/2.5/nl/ ; info:eu-repo/semantics/openAccess

Making CONCUR work

Autor*in: Hilbert, Mirco ; Schonefeld, Oliver ; Witt, Andreas

Erschienen: 2016

Verlag: Montreal : Extreme Markup Languages Conference

The SGML feature CONCUR allowed for a document to be simultaneously marked up in multiple conflicting hierarchical tagsets but validated and interpreted in one tagset at a time. Alas, CONCUR was rarely implemented, and XML does not address the... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/4529 https://ids-pub.bsz-bw.de/files/4529/Hilbert_Schonefeld_Witt_Making_CONCUR_work_2005.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-45299

The SGML feature CONCUR allowed for a document to be simultaneously marked up in multiple conflicting hierarchical tagsets but validated and interpreted in one tagset at a time. Alas, CONCUR was rarely implemented, and XML does not address the problem of conflicting hierarchies at all. The MuLaX document syntax is a non-XML syntax that enables multiply-encoded hierarchies by distinguishing different “layers” in the hierarchy by adding a layer ID as a prefix to the element names. The IDs tie all the elements in a single hierarchy together in an “annotation layer”. Extraction of a single annotation layer results in a well-formed XML document, and each annotation layer may be associated with an XML schema. The MuLaX processing model works on the nodes of one annotation layer at a time through Xpath-like navigation. CONCUR lives!

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Linguistik (410)
Schlagworte:	Computerlinguistik; Auszeichnungssprache; Annotation
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Declarations of Relations, Differences and Transformations between Theory-specific Treebanks: A New Methodology

Autor*in: Sasaki, Felix ; Witt, Andreas ; Metzing, Dieter

Erschienen: 2016

Verlag: Växjö : Växjö University Press

This paper deals with the problem of how to interrelate theory-specific treebanks and how to transform one treebank format to another. Currently, two approaches to achieve these goals can be differentiated. The first creates a mapping algorithm... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/4544 https://ids-pub.bsz-bw.de/files/4544/Sasaki_Witt_Metzing_Declarations_of_Relations_Differences_and_Transformations_2003.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-45440

This paper deals with the problem of how to interrelate theory-specific treebanks and how to transform one treebank format to another. Currently, two approaches to achieve these goals can be differentiated. The first creates a mapping algorithm between treebank formats. Categories of a source format are transformed into a target format via a given set of general or language-specific mapping rules. The second relates treebanks via a transformation to a general model of linguistic categories, for example based on the EAGLES recommendations for syntactic annotations of corpora, or relying on the HPSG framework. This paper proposes a new methodology as a solution for these desiderata.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Linguistik (410)
Schlagworte:	Korpus; Annotation; Methode
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Meaning and interpretation of concurrent markup

Autor*in: Witt, Andreas

Erschienen: 2016

Verlag: Tübingen : ZDV Universität Tübingen

Bibliographische Angaben
Zugang

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/4553 https://ids-pub.bsz-bw.de/files/4553/Witt_Meaning_and_interpretation_of_concurrent_markup_2002.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-45535

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Linguistik (410)
Schlagworte:	Korpus; Annotation; Auszeichnungssprache
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Linguistische Informationsmodellierung mit XML

Autor*in: Witt, Andreas

Erschienen: 2016

Verlag: Wiesbaden : VS Verlag

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/4556 https://ids-pub.bsz-bw.de/files/4556/Witt_Linguistische_Informationsmodellierung_mit_XML_2004.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-45567

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Deutsch
Medientyp:	Aufsatz aus einem Sammelband
Format:	Online
DDC Klassifikation:	Linguistik (410)
Schlagworte:	Computerlinguistik; Auszeichnungssprache; Annotation; Automatische Sprachanalyse
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Linguistische Annotationen für die Analyse von Gliederungsstrukturen wissenschaftlicher Texte

Autor*in: Lüngen, Harald ; Hebborn, Mariana

Erschienen: 2016

Verlag: Frankfurt am Main : Campus

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/4795 https://ids-pub.bsz-bw.de/files/4795/Luengen_Hebborn_Linguistische_Annotationen_fuer_die_Analyse_2012.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-47959

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Deutsch
Medientyp:	Aufsatz aus einem Sammelband
Format:	Online
DDC Klassifikation:	Linguistik (410)
Schlagworte:	Korpus; Annotation; Ontologie <Wissensverarbeitung>
Lizenz:	creativecommons.org/licenses/by-nc-nd/3.0/de/deed.de ; info:eu-repo/semantics/openAccess

IGGSA-STEPS: Shared Task on Source and Target Extraction from Political Speeches

Autor*in: Ruppenhofer, Josef ; Struß, Julia Maria

Erschienen: 2016

Verlag: Gesellschaft für Sprachtechnologie und Computerlinguistik : Regensburg

Accurate opinion mining requires the exact identification of the source and target of an opinion. To evaluate diverse tools, the research community relies on the existence of a gold standard corpus covering this need. Since such a corpus is currently... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/4941 https://ids-pub.bsz-bw.de/files/4941/Ruppenhofer_Stru%C3%9F_Sonntag_Grindl_IGGSA-STEPS_Shared_Task_on_Source_and_Target_Extraction_from_Political_Speeches_2014.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-49410

Accurate opinion mining requires the exact identification of the source and target of an opinion. To evaluate diverse tools, the research community relies on the existence of a gold standard corpus covering this need. Since such a corpus is currently not available for German, the Interest Group on German Sentiment Analysis decided to create such a resource and make it available to the research community in the context of a shared task. In this paper, we describe the selection of textual sources, development of annotation guidelines, and first evaluation results in the creation of a gold standard corpus for the German language.

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Aufsatz aus einer Zeitschrift
Format:	Online
DDC Klassifikation:	Linguistik (410)
Schlagworte:	Korpus; Annotation; Parlamentsdebatte; Data Mining; Politische Sprache; Automatische Sprachanalyse
Lizenz:	creativecommons.org/licenses/by-nc-sa/3.0/de ; info:eu-repo/semantics/openAccess

A CUP of CoFee: A Large Collection of Feedback Utterances Provided with Communicative Function Annotations

Autor*in: Prévot, Laurent ; Gorisch, Jan ; Bertrand, Roxane

Erschienen: 2016

Verlag: Paris : European Language Resources Association (ELRA)

There have been several attempts to annotate communicative functions to utterances of verbal feedback in English previously. Here, we suggest an annotation scheme for verbal and non-verbal feedback utterances in French including the categories base,... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/5041 https://ids-pub.bsz-bw.de/files/5041/Prevot_Gorisch_Bertrand_A_CUP_of_CoFee_2016.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-50414

There have been several attempts to annotate communicative functions to utterances of verbal feedback in English previously. Here, we suggest an annotation scheme for verbal and non-verbal feedback utterances in French including the categories base, attitude, previous and visual. The data comprises conversations, maptasks and negotiations from which we extracted ca. 13,000 candidate feedback utterances and gestures. 12 students were recruited for the annotation campaign of ca. 9,500 instances. Each instance was annotated by between 2 and 7 raters. The evaluation of the annotation agreement resulted in an average best-pair kappa of 0.6. While the base category with the values acknowledgement, evaluation, answer, elicit and other achieves good agreement, this is not the case for the other main categories. The data sets, which also include automatic extractions of lexical, positional and acoustic features, are freely available and will further be used for machine learning classification experiments to analyse the form-function relationship of feedback.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Linguistik (410)
Schlagworte:	Pragmatik; Gesprochene Sprache; Rückmeldung; Automatische Sprachanalyse; Annotation
Lizenz:	creativecommons.org/licenses/by-nc/4.0/ ; info:eu-repo/semantics/openAccess

Discourse Level Opinion Relations: An Annotation Study

Autor*in: Somasundaran, Swapna ; Ruppenhofer, Josef ; Wiebe, Janyce

Erschienen: 2016

Verlag: Pittsburgh : University of Pittsburgh

This work proposes opinion frames as a representation of discourse-level associations that arise from related opinion targets and which are common in task-oriented meeting dialogs. We define the opinion frames and explain their interpretation.... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/5172 https://ids-pub.bsz-bw.de/files/5172/Somasundaran_Ruppenhofer_Wiebe_Discourse_Level_Opinion_Relations_2008.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-51726

This work proposes opinion frames as a representation of discourse-level associations that arise from related opinion targets and which are common in task-oriented meeting dialogs. We define the opinion frames and explain their interpretation. Additionally we present an annotation scheme that realizes the opinion frames and via human annotation studies, we show that these can be reliably identified.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Linguistik (410)
Schlagworte:	Automatische Textanalyse; Propositionale Einstellung; Annotation
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Yes we can!? Annotating the senses of English modal verbs

Autor*in: Ruppenhofer, Josef ; Rehbein, Ines

Erschienen: 2016

Verlag: Paris : European Language Resources Association (ELRA)

This paper presents an annotation scheme for English modal verbs together with sense-annotated data from the news domain. We describe our annotation scheme and discuss problematic cases for modality annotation based on the inter-annotator agreement... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/5255 https://ids-pub.bsz-bw.de/files/5255/Ruppenhofer_Rehbein_Yes_we_can_2012.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-52557

This paper presents an annotation scheme for English modal verbs together with sense-annotated data from the news domain. We describe our annotation scheme and discuss problematic cases for modality annotation based on the inter-annotator agreement during the annotation. Furthermore, we present experiments on automatic sense tagging, showing that our annotations do provide a valuable training resource for NLP systems.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Linguistik (410)
Schlagworte:	Englisch; Modalverb; Annotation
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

There’s no Data like More Data? Revisiting the Impact of Data Size on a Classification Task

Autor*in: Rehbein, Ines ; Ruppenhofer, Josef

Erschienen: 2016

Verlag: European Language Resources Association

In the paper we investigate the impact of data size on a Word Sense Disambiguation task (WSD). We question the assumption that the knowledge acquisition bottleneck, which is known as one of the major challenges for WSD, can be solved by simply... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/5293 https://ids-pub.bsz-bw.de/files/5293/Rehbein_Ruppenhofer_There%27s_no_Data_like_More_Data_2010.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-52935

In the paper we investigate the impact of data size on a Word Sense Disambiguation task (WSD). We question the assumption that the knowledge acquisition bottleneck, which is known as one of the major challenges for WSD, can be solved by simply obtaining more and more training data. Our case study on 1,000 manually annotated instances of the German verb drohen (threaten) shows that the best performance is not obtained when training on the full data set, but by carefully selecting new training instances with regard to their informativeness for the learning process (Active Learning). We present a thorough evaluation of the impact of different sampling methods on the data sets and propose an improved method for uncertainty sampling which dynamically adapts the selection of new instances to the learning progress of the classifier, resulting in more robust results during the initial stages of learning. A qualitative error analysis identifies problems for automatic WSD and discusses the reasons for the great gap in performance between human annotators and our automatic WSD system.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Linguistik (410)
Schlagworte:	Dokumentverarbeitung; Automatische Sprachanalyse; Annotation
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Bringing Active Learning to Life

Autor*in: Rehbein, Ines ; Ruppenhofer, Josef ; Palmer, Alexis

Erschienen: 2016

Verlag: Beijing : Tsinghua University Press

Active learning has been applied to different NLP tasks, with the aim of limiting the amount of time and cost for human annotation. Most studies on active learning have only simulated the annotation scenario, using prelabelled gold standard data. We... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/5294 https://ids-pub.bsz-bw.de/files/5294/Rehbein_Ruppenhofer_Palmer_Bringing_Active_Learning_to_Life_2010.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-52945

Active learning has been applied to different NLP tasks, with the aim of limiting the amount of time and cost for human annotation. Most studies on active learning have only simulated the annotation scenario, using prelabelled gold standard data. We present the first active learning experiment for Word Sense Disambiguation with human annotators in a realistic environment, using fine-grained sense distinctions, and investigate whether AL can reduce annotation cost and boost classifier performance when applied to a real-world task.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Linguistik (410)
Schlagworte:	Computerlinguistik; Annotation
Lizenz:	creativecommons.org/licenses/by-nc-sa/3.0/de ; info:eu-repo/semantics/openAccess

Automatic Classification by Topic Domain for Meta Data Generation, Web Corpus Evaluation, and Corpus Comparison

Autor*in: Schäfer, Roland ; Bildhauer, Felix

Erschienen: 2016

Verlag: Berlin : Association for Computational Linguistics

In this paper, we describe preliminary results from an ongoing experiment wherein we classify two large unstructured text corpora—a web corpus and a newspaper corpus—by topic domain (or subject area). Our primary goal is to develop a method that... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/5297 https://ids-pub.bsz-bw.de/files/5297/Schaefer_Bildhauer_Automatic_Classification_by_Topic_Domain_2016.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-52979

In this paper, we describe preliminary results from an ongoing experiment wherein we classify two large unstructured text corpora—a web corpus and a newspaper corpus—by topic domain (or subject area). Our primary goal is to develop a method that allows for the reliable annotation of large crawled web corpora with meta data required by many corpus linguists. We are especially interested in designing an annotation scheme whose categories are both intuitively interpretable by linguists and firmly rooted in the distribution of lexical material in the documents. Since we use data from a web corpus and a more traditional corpus, we also contribute to the important field of corpus comparison and corpus evaluation. Technically, we use (unsupervised) topic modeling to automatically induce topic distributions over gold standard corpora that were manually annotated for 13 coarse-grained topic domains. In a second step, we apply supervised machine learning to learn the manually annotated topic domains using the previously induced topics as features. We achieve around 70% accuracy in 10-fold cross validations. An analysis of the errors clearly indicates, however, that a revised classification scheme and larger gold standard corpora will likely lead to a substantial increase in accuracy.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Linguistik (410)
Schlagworte:	Korpus; Textlinguistik; Annotation
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Assessing the benefits of partial automatic pre-labeling for frame-semantic annotation

Autor*in: Rehbein, Ines ; Ruppenhofer, Josef ; Sporleder, Caroline

Erschienen: 2016

Verlag: The Association for Computational Linguistics and The Asian Federation of Natural Processing

In this paper, we present the results of an experiment in which we assess the usefulness of partial semi-automatic annotation for frame labeling. While we found no conclusive evidence that it can speed up human annotation, automatic pre-annotation... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/5308 https://ids-pub.bsz-bw.de/files/5308/Rehbein_Ruppenhofer_Sporleder_Assessing_the_benefits_of_partial_pre-labeling_for_frame-semantic_annotation_2009.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-53087

In this paper, we present the results of an experiment in which we assess the usefulness of partial semi-automatic annotation for frame labeling. While we found no conclusive evidence that it can speed up human annotation, automatic pre-annotation does increase its overall quality.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Linguistik (410)
Schlagworte:	Frame-Semantik; Automatische Sprachanalyse; Annotation
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

STTS goes Kiez – Experiments on Annotating and Tagging Urban Youth Language

Autor*in: Rehbein, Ines ; Schalowski, Sören

Erschienen: 2016

Verlag: Regensburg : GSCL

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/5439 https://ids-pub.bsz-bw.de/files/5439/Rehbein_Schalowski_STTS_goes_Kiez_Experiments_on_Annotating_and_Tagging_Urban_Youth_Language_2013.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-54390

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Aufsatz aus einer Zeitschrift
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Korpus; Jugendsprache; Automatische Sprachverarbeitung; Annotation; Gesprochene Sprache
Lizenz:	creativecommons.org/licenses/by-sa/4.0/deed.de ; info:eu-repo/semantics/openAccess

Towards a syntactically motivated analysis of modifiers in German

Autor*in: Rehbein, Ines ; Hirschmann, Hagen

Erschienen: 2016

Verlag: Hildesheim : Universitätsverlag Hildesheim

The Stuttgart-Tübingen Tagset (STTS) is a widely used POS annotation scheme for German which provides 54 different tags for the analysis on the part of speech level. The tagset, however, does not distinguish between adverbs and different types of... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/5597 https://ids-pub.bsz-bw.de/files/5597/Rehbein_Hirschmann_Towards_a_syntactically_motivated_analysis_of_modifiers_in_German_2014.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-55975

The Stuttgart-Tübingen Tagset (STTS) is a widely used POS annotation scheme for German which provides 54 different tags for the analysis on the part of speech level. The tagset, however, does not distinguish between adverbs and different types of particles used for expressing modality, intensity, graduation, or to mark the focus of the sentence. In the paper, we present an extension to the STTS which provides tags for a more fine-grained analysis of modification, based on a syntactic perspective on parts of speech. We argue that the new classification not only enables us to do corpus-based linguistic studies on modification, but also improves statistical parsing. We give proof of concept by training a data-driven dependency parser on data from the TiGer treebank, providing the parser a) with the original STTS tags and b) with the new tags. Results show an improved labelled accuracy for the new, syntactically motivated classification.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Annotation; Automatische Sprachanalyse; Korpus
Lizenz:	creativecommons.org/licenses/by/3.0/de/deed.de ; info:eu-repo/semantics/openAccess

POS error detection in automatically annotated corpora

Autor*in: Rehbein, Ines

Erschienen: 2016

Verlag: Stroudsburg, PA : ACL

Recent work on error detection has shown that the quality of manually annotated corpora can be substantially improved by applying consistency checks to the data and automatically identifying incorrectly labelled instances. These methods, however, can... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/5598 https://ids-pub.bsz-bw.de/files/5598/Rehbein_POS_error_detection_in_automatically_annotated_corpora_2014.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-55986

Recent work on error detection has shown that the quality of manually annotated corpora can be substantially improved by applying consistency checks to the data and automatically identifying incorrectly labelled instances. These methods, however, can not be used for automatically annotated corpora where errors are systematic and cannot easily be identified by looking at the variance in the data. This paper targets the detection of POS errors in automatically annotated corpora, so-called silver standards, showing that by combining different measures sensitive to annotation quality we can identify a large part of the errors and obtain a substantial increase in accuracy.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Korpus; Automatische Sprachanalyse; Annotation
Lizenz:	creativecommons.org/licenses/by/3.0/de/deed.de ; info:eu-repo/semantics/openAccess

Filtern nach

Aktive Filter

Kategorien:

Bereich

Quelle

Format

Beteiligt

Medientyp

Sprache

Jahr

Letzte Suchanfragen

Ergebnisse für *

Adding Value to CMC Corpora: CLARINification and Part-of-speech Annotation of the Dortmund Chat Corpus

The Morphosyntactic Annotation of DeReKo: Interpretation, Opportunities, and Pitfalls

SusTEInability of linguistic resources through feature structures

Guidance through the standards jungle for linguistic resources

Different Views on Markup

Integrated Linguistic Annotation Models and Their Application in the Domain of Antecedent Detection

Multi-Dimensional Markup: N-way relations as a generalisation over possible relations between annotation layers

Corpus Masking: Legally Bypassing Licensing Restrictions for the Free Distribution of Text Collections

Multidimensional markup and heterogeneous linguistic resources

Making CONCUR work

Declarations of Relations, Differences and Transformations between Theory-specific Treebanks: A New Methodology

Meaning and interpretation of concurrent markup

Linguistische Informationsmodellierung mit XML

Linguistische Annotationen für die Analyse von Gliederungsstrukturen wissenschaftlicher Texte

IGGSA-STEPS: Shared Task on Source and Target Extraction from Political Speeches

A CUP of CoFee: A Large Collection of Feedback Utterances Provided with Communicative Function Annotations

Discourse Level Opinion Relations: An Annotation Study

Yes we can!? Annotating the senses of English modal verbs

There’s no Data like More Data? Revisiting the Impact of Data Size on a Classification Task

Bringing Active Learning to Life

Automatic Classification by Topic Domain for Meta Data Generation, Web Corpus Evaluation, and Corpus Comparison

Assessing the benefits of partial automatic pre-labeling for frame-semantic annotation

STTS goes Kiez – Experiments on Annotating and Tagging Urban Youth Language

Towards a syntactically motivated analysis of modifiers in German

POS error detection in automatically annotated corpora

Kontakt

Partner