OWIDplusLIVE. Day-to-day collection, exploration, analysis, and visualization of N-Gram frequencies in German (online press) language
With OWIDplusLIVE, we would like to introduce the EURALEX community to two resources that provide analytical access to daily updated data (data: frequency data and N-grams – reference point: previous day).
mehr
Volltext:
|
|
Zitierfähiger Link:
|
|
With OWIDplusLIVE, we would like to introduce the EURALEX community to two resources that provide analytical access to daily updated data (data: frequency data and N-grams – reference point: previous day).
|
Export in Literaturverwaltung |
|
Multi-level annotation in MMAX
We present a light-weight tool for the annotation of linguistic data on multiple levels. It is based on the simplification of annotations to sets of markables having attributes and standing in certain relations to each other. We describe the main...
mehr
Volltext:
|
|
Zitierfähiger Link:
|
|
We present a light-weight tool for the annotation of linguistic data on multiple levels. It is based on the simplification of annotations to sets of markables having attributes and standing in certain relations to each other. We describe the main features of the tool, emphasizing its simplicity, customizability and versatility
|
Export in Literaturverwaltung |
|
Building NLP resources for Dzongkha: A tagset and a tagged corpus
This paper describes the application of probabilistic part of speech taggers to the Dzongkha language. A tag set containing 66 tags is designed, which is based on the Penn Treebank. A training corpus of 40,247 tokens is utilized to train the model....
mehr
Volltext:
|
|
Zitierfähiger Link:
|
|
This paper describes the application of probabilistic part of speech taggers to the Dzongkha language. A tag set containing 66 tags is designed, which is based on the Penn Treebank. A training corpus of 40,247 tokens is utilized to train the model. Using the lexicon extracted from the training corpus and lexicon from the available word list, we used two statistical taggers for comparison reasons. The best result achieved was 93.1% accuracy in a 10-fold cross validation on the training set. The winning tagger was thereafter applied to annotate a 570,247 token corpus.
|
Export in Literaturverwaltung |
|
The CLARIN infrastructure as an interoperable language technology platform for SSH and beyond
CLARIN is a European Research Infrastructure Consortium developing and providing a federated and interoperable platform to support scientists in the field of the Social Sciences and Humanities in carrying-out language-related research. This...
mehr
Volltext:
|
|
Zitierfähiger Link:
|
|
CLARIN is a European Research Infrastructure Consortium developing and providing a federated and interoperable platform to support scientists in the field of the Social Sciences and Humanities in carrying-out language-related research. This contribution provides an overview of the entire infrastructure with a particular focus on tool interoperability, ease of access to research data, tools and services, the importance of sharing knowledge within and across (national) communities, and community building. By taking into account FAIR principles from the very beginning, CLARIN succeeded in becoming a successful example of a research infrastructure that is actively used by its members. The benefits CLARIN members reap from their infrastructure secure a future for their common good that is both sustainable and attractive to partners beyond the original target groups.
|