Feature-based encoding and querying language resources with character semantics
In this paper we discuss the explicit representation of character features pertaining to written language resources, which we argue are critically necessary in the long term of archiving language data. Much focus on the creation of language resources...
mehr
Volltext:
|
|
Zitierfähiger Link:
|
|
In this paper we discuss the explicit representation of character features pertaining to written language resources, which we argue are critically necessary in the long term of archiving language data. Much focus on the creation of language resources and their associated preservation is at the level of the corpus itself; however it is generally accepted that long term interpretation of these language resources requires more than a best practice data format. In particular, where language resources are created in linguistic fieldwork, and especially for minority languages, the need for preservation not only of the resource itself, but of additional metadata which allows for the resource to be accurately interpreted in the future is becoming a topic of research in itself. In this paper we extend earlier work on semantically based character decomposition to include representation of character properties in a variety of models, and a mechanism for exploiting these properties through queries.
|
Export in Literaturverwaltung |
|
A BLARK extension for temporal annotation mining
The Basic Language Resource Kit (BLARK) proposed by Krauwer is designed for the creation of initial textual resources. There are a number of toolkits for the development of spoken language resources and systems, but tools for second level resources,...
mehr
Volltext:
|
|
Zitierfähiger Link:
|
|
The Basic Language Resource Kit (BLARK) proposed by Krauwer is designed for the creation of initial textual resources. There are a number of toolkits for the development of spoken language resources and systems, but tools for second level resources, that is, resources which are the result of processing primary level speech resources such as speech recordings. Typically, processing of this kind in phonetics is done manually, with the aid of spreadsheets multi-purpose statistics software. We propose a Basic Language and Speech Kit (BLAST) as an extension to BLARK and suggest a strategy for integrating the kit into the Natural Language Toolkit (NLTK). The prototype kit is evaluated in an application to examining temporal properties of spoken Brazilian Portuguese.
|
Export in Literaturverwaltung |
|
CoGesT: a formal transcription system for conversational gesture
In order to create reusable and sustainable multimodal resources a transcription model for hand and arm gestures in conversation is needed. We argue that transcription systems so far developed for sign language transcription and psychological...
mehr
Volltext:
|
|
Zitierfähiger Link:
|
|
In order to create reusable and sustainable multimodal resources a transcription model for hand and arm gestures in conversation is needed. We argue that transcription systems so far developed for sign language transcription and psychological analysis are not suitable for the linguistic analysis of conversational gesture. Such a model must adhere to a strict form-function distinction and be both computationally explicit and compatible with descriptive notations such as feature structures in other areas of computational and descriptive linguistics. We describe the development and evaluation of a suitable formal model using a feature-based transcription system, concentrating as a first step on arm gestures within the context of the development of an annotated video resource and gesture lexicon.
|
Export in Literaturverwaltung |
|
Consistent storage of metadata in inference lexica: the MetaLex approach
With MetaLex we introduce a framework for metadata management where information can be inferred from different areas of metadata coding, such as metadata for catalogue descriptions, linguistic levels, or tiers. This is done for consistency and...
mehr
Volltext:
|
|
Zitierfähiger Link:
|
|
With MetaLex we introduce a framework for metadata management where information can be inferred from different areas of metadata coding, such as metadata for catalogue descriptions, linguistic levels, or tiers. This is done for consistency and efficiency in metadata recording and applies the same inference techniques that are used for lexical inference. For this purpose we motivate the need for metadata descriptions on all document levels, describe the different structures of metadata, use existing metadata recommendations on different levels of annotations, and show a usecase of metadata inference.
|
Export in Literaturverwaltung |
|
Annotation driven concordancing: the PAX toolkit
We describe PAX, "Portable Audio Concordance System", a proof-of-concept prototype of a multipurpose, multilingual audio concordance toolkit. The primary goal is to support efficient grammar and lexicon construction in the documentation of unwritten...
mehr
Volltext:
|
|
Zitierfähiger Link:
|
|
We describe PAX, "Portable Audio Concordance System", a proof-of-concept prototype of a multipurpose, multilingual audio concordance toolkit. The primary goal is to support efficient grammar and lexicon construction in the documentation of unwritten languages; languages currently included are Ega, Anyi, and Koulango (Ivory Coast), additional samples in German and English. The approach combines methods from corpus linguistics, annotation theory and practice, phonetics and lexicography.
|
Export in Literaturverwaltung |
|
A multi-view hyperlexicon resource for speech and language system development
New generations of integrated multimodal speech and language systems with dictation, readback or talking face facilities require multiple sources of lexical information for development and evaluation. Recent developments in hyperlexicon development...
mehr
Volltext:
|
|
Zitierfähiger Link:
|
|
New generations of integrated multimodal speech and language systems with dictation, readback or talking face facilities require multiple sources of lexical information for development and evaluation. Recent developments in hyperlexicon development offer new perspectives for the development of such resources which are at the same time practically useful, computationally feasible, and theoretically well-founded. We describe the specification, three-level lexical document design principles, and implementation of a MARTIF document structure and several presentation structures for a terminological lexicon, including both on demand access and full hypertext lexicon compilation. The underlying resource is a relational lexical database with SQL querying and access via a CGI internet interface. This resource is mapped on to the hypergraph structure which defines the macrostructure of the hyperlexicon.
|
Export in Literaturverwaltung |
|
The computational semantics of characters
In this paper we present a new approach to the computational semantics of characters, which fills this gap: the orthographic projection of linguistic information, analogous to phonetic interpretation. We consider a number of use cases prior to...
mehr
Volltext:
|
|
Zitierfähiger Link:
|
|
In this paper we present a new approach to the computational semantics of characters, which fills this gap: the orthographic projection of linguistic information, analogous to phonetic interpretation. We consider a number of use cases prior to discussion of three different perspectives. Adopting a holistic view of semantics, we discover that there are properties at this lower level which require similar specification to that at more well-studied levels, and which can coherently extend computational linguistic models to the domain of orthography.
|
Export in Literaturverwaltung |
|