A machine learning approach to pronoun resolution in spoken dialogue
We apply a decision tree based approach to pronoun resolution in spoken dialogue. Our system deals with pronouns with NP- and non-NP-antecedents. We present a set of features designed for pronoun resolution in spoken dialogue and determine the most...
mehr
Volltext:
|
|
Zitierfähiger Link:
|
|
We apply a decision tree based approach to pronoun resolution in spoken dialogue. Our system deals with pronouns with NP- and non-NP-antecedents. We present a set of features designed for pronoun resolution in spoken dialogue and determine the most promising features. We evaluate the system on twenty Switchboard dialogues and show that it compares well to Byron’s (2002) manually tuned system.
|
Export in Literaturverwaltung |
|
Multi-level annotation in MMAX
We present a light-weight tool for the annotation of linguistic data on multiple levels. It is based on the simplification of annotations to sets of markables having attributes and standing in certain relations to each other. We describe the main...
mehr
Volltext:
|
|
Zitierfähiger Link:
|
|
We present a light-weight tool for the annotation of linguistic data on multiple levels. It is based on the simplification of annotations to sets of markables having attributes and standing in certain relations to each other. We describe the main features of the tool, emphasizing its simplicity, customizability and versatility
|
Export in Literaturverwaltung |
|
An API for discourse-level access to XML-encoded corpora
We describe a simple and efficient Java object model and application programming interface (API) for (possibly multi-modal) annotated natural language corpora. Corpora are represented as elements like Sentences, Turns, Utterances, Words, Gestures and...
mehr
Volltext:
|
|
Zitierfähiger Link:
|
|
We describe a simple and efficient Java object model and application programming interface (API) for (possibly multi-modal) annotated natural language corpora. Corpora are represented as elements like Sentences, Turns, Utterances, Words, Gestures and Markables. The API allows linguists to access corpora in terms of these discourse-level elements, i.e. at a conceptual level they are familiar with, with the flexibility offered by a general purpose programming language. It is also a contribution to corpus standardization efforts because it is based on a straightforward and easily extensible data model which can serve as a target for conversion of different corpus formats.
|
Export in Literaturverwaltung |
|
Applying co-training to reference resolution
In this paper, we investigate the practical applicability of Co-Training for the task of building a classifier for reference resolution. We are concerned with the question if Co-Training can significantly reduce the amount of manual labeling work and...
mehr
Volltext:
|
|
Zitierfähiger Link:
|
|
In this paper, we investigate the practical applicability of Co-Training for the task of building a classifier for reference resolution. We are concerned with the question if Co-Training can significantly reduce the amount of manual labeling work and still produce a classifier with an acceptable performance.
|
Export in Literaturverwaltung |
|
Annotating anaphoric and bridging relations with MMAX
We present a tool for the annotation of anaphoric and bridging relations in a corpus of written texts. Based on differences as well as similarities between these phenomena, we define an annotation scheme. We then implement the scheme within an...
mehr
Volltext:
|
|
Zitierfähiger Link:
|
|
We present a tool for the annotation of anaphoric and bridging relations in a corpus of written texts. Based on differences as well as similarities between these phenomena, we define an annotation scheme. We then implement the scheme within an annotation tool and demonstrate its use.
|
Export in Literaturverwaltung |
|
Improving extractive dialogue summarization by utilizing human feedback
Automatic summarization systems usually are trained and evaluated in a particular domain with fixed data sets. When such a system is to be applied to slightly different input, labor- and cost-intensive annotations have to be created to retrain the...
mehr
Volltext:
|
|
Zitierfähiger Link:
|
|
Automatic summarization systems usually are trained and evaluated in a particular domain with fixed data sets. When such a system is to be applied to slightly different input, labor- and cost-intensive annotations have to be created to retrain the system. We deal with this problem by providing users with a GUI which allows them to correct automatically produced imperfect summaries. The corrected summary in turn is added to the pool of training data. The performance of the system is expected to improve as it adapts to the new domain.
|
Export in Literaturverwaltung |
|
Reconstructing manual information extraction with DB-to-document backprojection: Experiments in the life science domain
We introduce a novel scientific document processing task for making previously inaccessible information in printed paper documents available to automatic processing. We describe our data set of scanned documents and data records from the biological...
mehr
Volltext:
|
|
Zitierfähiger Link:
|
|
We introduce a novel scientific document processing task for making previously inaccessible information in printed paper documents available to automatic processing. We describe our data set of scanned documents and data records from the biological database SABIO-RK, provide a definition of the task, and report findings from preliminary experiments. Rigorous evaluation proved challenging due to lack of gold-standard data and a difficult notion of correctness. Qualitative inspection of results, however, showed the feasibility and usefulness of the task.
|
Export in Literaturverwaltung |
|
Transparent, efficient, and robust word embedding access with WOMBAT
We present WOMBAT, a Python tool which supports NLP practitioners in accessing word embeddings from code. WOMBAT addresses common research problems, including unified access, scaling, and robust and reproducible preprocessing. Code that uses WOMBAT...
mehr
Volltext:
|
|
Zitierfähiger Link:
|
|
We present WOMBAT, a Python tool which supports NLP practitioners in accessing word embeddings from code. WOMBAT addresses common research problems, including unified access, scaling, and robust and reproducible preprocessing. Code that uses WOMBAT for accessing word embeddings is not only cleaner, more readable, and easier to reuse, but also much more efficient than code using standard in-memory methods: a Python script using WOMBAT for evaluating seven large word embedding collections (8.7M embedding vectors in total) on a simple SemEval sentence similarity task involving 250 raw sentence pairs completes in under ten seconds end-to-end on a standard notebook computer.
|
Export in Literaturverwaltung |
|
Knowledge sources for bridging resolution in multi-party dialog
In this paper we investigate the coverage of the two knowledge sources WordNet and Wikipedia for the task of bridging resolution. We report on an annotation experiment which yielded pairs of bridging anaphors and their antecedents in spoken...
mehr
Volltext:
|
|
Zitierfähiger Link:
|
|
In this paper we investigate the coverage of the two knowledge sources WordNet and Wikipedia for the task of bridging resolution. We report on an annotation experiment which yielded pairs of bridging anaphors and their antecedents in spoken multi-party dialog. Manual inspection of the two knowledge sources showed that, with some interesting exceptions, Wikipedia is superior to WordNet when it comes to the coverage of information necessary to resolve the bridging anaphors in our data set. We further describe a simple procedure for the automatic extraction of the required knowledge from Wikipedia by means of an API, and discuss some of the implications of the procedure’s performance.
|
Export in Literaturverwaltung |
|