Suchergebnisse

Datenbank für Gesprochenes Deutsch (DGD)

Autor*in: Schmidt, Thomas

Erschienen: 2016

Verlag: Duisburg : Nisaba

Bibliographische Angaben
Zugang

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/5683 https://ids-pub.bsz-bw.de/files/5683/Schmidt_Datenbank_fuer_Gesprochenes_Deutsch_DGD_2016.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-56837

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Deutsch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Korpus; Datenbank; Gesprochene Sprache
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Addressing Cha(lle)nges in Long-Term Archiving of Large Corpora

Autor*in: Arnold, Denis ; Fisseni, Bernhard ; Kamocki, Paweł ; Schonefeld, Oliver ; Kupietz, Marc ; Schmidt, Thomas

Erschienen: 2020

Verlag: Paris : European Language Resources Association

This paper addresses long-term archival for large corpora. Three aspects specific to language resources are focused, namely (1) the removal of resources for legal reasons, (2) versioning of (unchanged) objects in constantly growing resources,... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/9812 https://ids-pub.bsz-bw.de/files/9812/Arnold_Fisseni_Kamocki_et_al_Challenges_in_Long_Term_Archiving_2020.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-98129

This paper addresses long-term archival for large corpora. Three aspects specific to language resources are focused, namely (1) the removal of resources for legal reasons, (2) versioning of (unchanged) objects in constantly growing resources, especially where objects can be part of multiple releases but also part of different collections, and (3) the conversion of data to new formats for digital preservation. It is motivated why language resources may have to be changed, and why formats may need to be converted. As a solution, the use of an intermediate proxy object called a signpost is suggested. The approach will be exemplified with respect to the corpora of the Leibniz Institute for the German Language in Mannheim, namely the German Reference Corpus (DeReKo) and the Archive for Spoken German (AGD).

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Korpus; Langzeitarchivierung; Nutzungsrecht; Dateiformat
Lizenz:	creativecommons.org/licenses/by-nc/4.0/ ; info:eu-repo/semantics/openAccess

Using Full Text Indices for Querying Spoken Language Data

Autor*in: Frick, Elena ; Schmidt, Thomas

Erschienen: 2020

Verlag: Paris : European Language Resources Association

As a part of the ZuMult-project, we are currently modelling a backend architecture that should provide query access to corpora from the Archive of Spoken German (AGD) at the Leibniz-Institute for the German Language (IDS). We are exploring how to... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/9814 https://ids-pub.bsz-bw.de/files/9814/Frick_Schmidt_Using_full_text_indices_for_querying_SLD_2020.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-98143

As a part of the ZuMult-project, we are currently modelling a backend architecture that should provide query access to corpora from the Archive of Spoken German (AGD) at the Leibniz-Institute for the German Language (IDS). We are exploring how to reuse existing search engine frameworks providing full text indices and allowing to query corpora by one of the corpus query languages (QLs) established and actively used in the corpus research community. For this purpose, we tested MTAS - an open source Lucene-based search engine for querying on text with multilevel annotations. We applied MTAS on three oral corpora stored in the TEI-based ISO standard for transcriptions of spoken language (ISO 24624:2016). These corpora differ from the corpus data that MTAS was developed for, because they include interactions with two and more speakers and are enriched, inter alia, with timeline-based annotations. In this contribution, we report our test results and address issues that arise when search frameworks originally developed for querying written corpora are being transferred into the field of spoken language.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Korpus; Abfrage; Gesprochene Sprache; Text Encoding Initiative; Computerlinguistik
Lizenz:	creativecommons.org/licenses/by-nc/4.0/ ; info:eu-repo/semantics/openAccess

Filtern nach

Aktive Filter

Kategorien:

Bereich

Quelle

Format

Beteiligt

Medientyp

Sprache

Jahr

Letzte Suchanfragen

Ergebnisse für 1.12.1948

Datenbank für Gesprochenes Deutsch (DGD)

Addressing Cha(lle)nges in Long-Term Archiving of Large Corpora

Using Full Text Indices for Querying Spoken Language Data

Kontakt

Partner