Suchergebnisse

User, who art thou? User profiling for oral corpus platforms

Autor*in: Fandrych, Christian

Erschienen: 2016

Verlag: Institut für Deutsche Sprache, Bibliothek, Mannheim

Zugang:

Resolving-System

Langzeitarchivierung Nationalbibliothek

Verlag (kostenfrei)

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Quelle:	Verbundkataloge
Beteiligt:	Frick, Elena (Verfasser); Hedeland, Hanna (Verfasser); Iliash, Anna (Verfasser); Jettka, Daniel (Verfasser); Meißner, Cordula (Verfasser); Schmidt, Thomas (Verfasser); Wallner, Franziska (Verfasser); Weigert, Kathrin (Verfasser); Westpfahl, Swantje (Verfasser); Calzolari, Nicoletta (Herausgeber); Choukri, Khalid (Herausgeber); Declerck, Thierry (Herausgeber); Goggi, Sara (Herausgeber); Grobelnik, Marko (Herausgeber); Maegaard, Bente (Herausgeber); Mariani, Joseph (Herausgeber); Mazo, Helene (Herausgeber); Moreno, Asunción (Herausgeber); Odijk, Jan (Herausgeber); Piperidis, Stelios (Herausgeber)
Sprache:	Englisch
Medientyp:	Buch (Monographie)
Format:	Online
Weitere Identifier:	urn: urn:nbn:de:bsz:mh39-50774
DDC Klassifikation:	Sprache (400)
Schlagworte:	Deutsch; Korpus <Linguistik>; Gesprochene Sprache; Benutzerforschung
Weitere Schlagworte:	oral corpus platform; user survey
Umfang:	Online-Ressource
Bemerkung(en):	In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Slovenia. - Paris : European Language Resources Association (ELRA), 2016., S. 280-287, ISBN 978-2-9517408-9-1

Accessing spoken language corpora: an overview of current approaches

Autor*in: Batinić, Josip ; Frick, Elena ; Schmidt, Thomas

Erschienen: 2021

Verlag: Edinburgh : Edinburgh University Press ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

In this paper, we present an overview of freely available web applications providing online access to spoken language corpora. We explore and discuss various solutions with which the corpus providers and corpus platform developers address the needs... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/10769 https://ids-pub.bsz-bw.de/files/10769/Batinic_Accessing_spoken_language_corpora_2021.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-107690 https://doi.org/10.3366/cor.2021.0229

In this paper, we present an overview of freely available web applications providing online access to spoken language corpora. We explore and discuss various solutions with which the corpus providers and corpus platform developers address the needs of researchers who are working with spoken language. The paper aims to contribute to the long-overdue exchange and discussion of methods and best practices in the design of online access to spoken language corpora.

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Aufsatz aus einer Zeitschrift
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Gesprochene Sprache; Korpus; Forschungsdaten; Datenbank
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Querying Interaction Structure: Approaches to Overlap in Spoken Language Corpora

Autor*in: Frick, Elena ; Helmer, Henrike ; Schmidt, Thomas

Erschienen: 2022

Verlag: Paris : European Language Resources Association (ELRA) ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

In this paper, we address two problems in indexing and querying spoken language corpora with overlapping speaker contributions. First, we look into how token distance and token precedence can be measured when multiple primary data streams are... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11105 https://ids-pub.bsz-bw.de/files/11105/Frick_Helmer_Schmidt_Querying_Interaction_Structure_2022.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-111054

In this paper, we address two problems in indexing and querying spoken language corpora with overlapping speaker contributions. First, we look into how token distance and token precedence can be measured when multiple primary data streams are available and when transcriptions happen to be tokenized, but are not synchronized with the sound at the level of individual tokens. We propose and experiment with a speaker based search mode that enables any speaker’s transcription tier to be the basic tokenization layer whereby the contributions of other speakers are mapped to this given tier. Secondly, we address two distinct methods of how speaker overlaps can be captured in the TEI based ISO Standard for Spoken Language Transcriptions (ISO 24624:2016) and how they can be queried by MTAS – an open source Lucene-based search engine for querying text with multilevel annotations. We illustrate the problems, introduce possible solutions and discuss their benefits and drawbacks.

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Aufsatz aus einem Sammelband
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Deutsch; Korpus; Gesprochene Sprache; Sprecherwechsel; Token; Abfragesprache; Suchmaschine
Lizenz:	creativecommons.org/licenses/by-nc/4.0/deed.de ; info:eu-repo/semantics/openAccess

Querying Repetitions in Spoken Language Corpora

Autor*in: Frick, Elena ; Helmer, Henrike ; Lemmenmeier-Batinić, Dolores

Erschienen: 2024

Verlag: Wien : Association for Computational Linguistics ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

In this paper, we present a tool for searching repetitions in interaction corpora. Our approach based on the MTAS-technology uses common search token indices to retrieve repetitions from spoken language transcripts in a dynamic way. The CQP Query... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/12828 https://ids-pub.bsz-bw.de/files/12828/Frick_Helmer_Lemmenmeier_Batinic_querying_2024.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-128287

In this paper, we present a tool for searching repetitions in interaction corpora. Our approach based on the MTAS-technology uses common search token indices to retrieve repetitions from spoken language transcripts in a dynamic way. The CQP Query Language and a graphical user interface menu with extensive settings specially designed for conversation analysis researchers allow to find repetitions of complex linguistic forms in various pragmatic contexts. Furthermore, the web application enables searching for repetition constructions that may contain synonyms and hyp(er)onyms coming from GermaNet or from custom-defined word lists uploaded to the tool.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Wiederholung; Gesprochene Sprache; Pragmatik; Synonym; Homonym; Konversationsanalyse; Korpus
Lizenz:	creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess

KorAP: the new corpus analysis platform at IDS Mannheim

Autor*in: Bański, Piotr ; Bingel, Joachim ; Diewald, Nils ; Frick, Elena ; Hanl, Michael ; Kupietz, Marc ; Pȩzik, Piotr ; Schnober, Carsten ; Witt, Andreas

Erschienen: 2014

Verlag: Poznań : Uniwersytet im. Adama Mickiewicza w Poznaniu

The KorAP project (“Korpusanalyseplattform der nächste Generation”, “Corpus-analysis platform of the next generation”), carried out at the Institut fUr Deutsche Sprache (IDS) in Mannheim, Germany, has as its goal the development of a modem,... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/3261 https://ids-pub.bsz-bw.de/files/3261/Banski_KorAP_2013.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-32617

The KorAP project (“Korpusanalyseplattform der nächste Generation”, “Corpus-analysis platform of the next generation”), carried out at the Institut fUr Deutsche Sprache (IDS) in Mannheim, Germany, has as its goal the development of a modem, state-of-the-art corpus-analysis platform, capable of handling very large corpora and opening the perspectives for innovative linguistic research. The platform will facilitate new linguistic findings by making it possible to manage and analyse extremely large amounts of primary data and annotations, while at the same time allowing an undistorted view of the primary un-annotated text, and thus fully satisfying expectations associated with a scientific tool. The project started in July 2011 and is funded till June 2014. The demo presentation in December will be the first version following a preliminary feature freeze, and will open the alpha testing phase of the project.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Germanische Sprachen; Deutsch (430)
Schlagworte:	Korpus
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Robust corpus architecture: a new look at virtual collections and data access

Autor*in: Bański, Piotr ; Frick, Elena ; Hanl, Michael ; Kupietz, Marc ; Schnober, Carsten ; Witt, Andreas

Erschienen: 2015

Verlag: Lancaster : UCREL

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/4485 https://ids-pub.bsz-bw.de/files/4485/Ba%C5%84ski_Frick_Hanl_Robust_corpus_architecture_2013.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-44855

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Aufsatz aus einem Sammelband
Format:	Online
DDC Klassifikation:	Linguistik (410)
Schlagworte:	Korpus
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

The New IDS Corpus Analysis Platform: Challenges and Prospects

Autor*in: Bański, Piotr ; Fischer, Peter M. ; Frick, Elena ; Ketzan, Erik ; Kupietz, Marc ; Schnober, Carsten ; Schonefeld, Oliver ; Witt, Andreas

Erschienen: 2015

Verlag: Paris : European Language Resources Association (ELRA)

The present article describes the first stage of the KorAP project, launched recently at the Institut für Deutsche Sprache (IDS) in Mannheim, Germany. The aim of this project is to develop an innovative corpus analysis platform to tackle the... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/4497 https://ids-pub.bsz-bw.de/files/4497/Banski_Fischer_Frick_The_New_IDS_Corpus_Analysis_Platform_2012.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-44974

The present article describes the first stage of the KorAP project, launched recently at the Institut für Deutsche Sprache (IDS) in Mannheim, Germany. The aim of this project is to develop an innovative corpus analysis platform to tackle the increasing demands of modern linguistic research. The platform will facilitate new linguistic findings by making it possible to manage and analyse primary data and annotations in the petabyte range, while at the same time allowing an undistorted view of the primary linguistic data, and thus fully satisfying the demands of a scientific tool. An additional important aim of the project is to make corpus data as openly accessible as possible in light of unavoidable legal restrictions, for instance through support for distributed virtual corpora, user-defined annotations and adaptable user interfaces, as well as interfaces and sandboxes for user-supplied analysis applications. We discuss our motivation for undertaking this endeavour and the challenges that face it. Next, we outline our software implementation plan and describe development to-date.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Linguistik (410)
Schlagworte:	Korpus
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Corpus Query Lingua Franca (CQLF)

Autor*in: Bański, Piotr ; Frick, Elena ; Witt, Andreas

Erschienen: 2016

Verlag: Paris : European Language Resources Association (ELRA)

The present paper describes Corpus Query Lingua Franca (ISO CQLF), a specification designed at ISO Technical Committee 37 Subcommittee 4 “Language resource management” for the purpose of facilitating the comparison of properties of corpus query... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/5040 https://ids-pub.bsz-bw.de/files/5040/Banksi_Frick_Witt_Corpus_Query_Lingua_Franca_2016.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-50405

The present paper describes Corpus Query Lingua Franca (ISO CQLF), a specification designed at ISO Technical Committee 37 Subcommittee 4 “Language resource management” for the purpose of facilitating the comparison of properties of corpus query languages. We overview the motivation for this endeavour and present its aims and its general architecture. CQLF is intended as a multi-part specification; here, we concentrate on the basic metamodel that provides a frame that the other parts fit in.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Linguistik (410)
Schlagworte:	Korpus; Abfragesprache
Lizenz:	creativecommons.org/licenses/by-nc/4.0/ ; info:eu-repo/semantics/openAccess

User, who art thou? User profiling for oral corpus platforms

Autor*in: Fandrych, Christian ; Frick, Elena ; Hedeland, Hanna ; Iliash, Anna ; Jettka, Daniel ; Meißner, Cordula ; Schmidt, Thomas ; Wallner, Franziska ; Weigert, Kathrin ; Westpfahl, Swantje

Erschienen: 2016

Verlag: Paris : European Language Resources Association (ELRA)

This contribution presents the background, design and results of a study of users of three oral corpus platforms in Germany. Roughly 5.000 registered users of the Database for Spoken German (DGD), the GeWiss corpus and the corpora of the Hamburg... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/5077 https://ids-pub.bsz-bw.de/files/5077/Fandrych_et_al_User_who_art_thou_2016.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-50774

This contribution presents the background, design and results of a study of users of three oral corpus platforms in Germany. Roughly 5.000 registered users of the Database for Spoken German (DGD), the GeWiss corpus and the corpora of the Hamburg Centre for Language Corpora (HZSK) were asked to participate in a user survey. This quantitative approach was complemented by qualitative interviews with selected users. We briefly introduce the corpus resources involved in the study in section 2. Section 3 describes the methods employed in the user studies. Section 4 summarizes results of the studies focusing on selected key topics. Section 5 attempts a generalization of these results to larger contexts.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Linguistik (410)
Schlagworte:	Deutsch; Korpus; Gesprochene Sprache; Benutzerforschung
Lizenz:	creativecommons.org/licenses/by-nc/4.0/ ; info:eu-repo/semantics/openAccess

Using Full Text Indices for Querying Spoken Language Data

Autor*in: Frick, Elena ; Schmidt, Thomas

Erschienen: 2020

Verlag: Paris : European Language Resources Association

As a part of the ZuMult-project, we are currently modelling a backend architecture that should provide query access to corpora from the Archive of Spoken German (AGD) at the Leibniz-Institute for the German Language (IDS). We are exploring how to... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/9814 https://ids-pub.bsz-bw.de/files/9814/Frick_Schmidt_Using_full_text_indices_for_querying_SLD_2020.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-98143

As a part of the ZuMult-project, we are currently modelling a backend architecture that should provide query access to corpora from the Archive of Spoken German (AGD) at the Leibniz-Institute for the German Language (IDS). We are exploring how to reuse existing search engine frameworks providing full text indices and allowing to query corpora by one of the corpus query languages (QLs) established and actively used in the corpus research community. For this purpose, we tested MTAS - an open source Lucene-based search engine for querying on text with multilevel annotations. We applied MTAS on three oral corpora stored in the TEI-based ISO standard for transcriptions of spoken language (ISO 24624:2016). These corpora differ from the corpus data that MTAS was developed for, because they include interactions with two and more speakers and are enriched, inter alia, with timeline-based annotations. In this contribution, we report our test results and address issues that arise when search frameworks originally developed for querying written corpora are being transferred into the field of spoken language.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Korpus; Abfrage; Gesprochene Sprache; Text Encoding Initiative; Computerlinguistik
Lizenz:	creativecommons.org/licenses/by-nc/4.0/ ; info:eu-repo/semantics/openAccess

Filtern nach

Aktive Filter

Kategorien:

Bereich

Quelle

Format

Beteiligt

Medientyp

Sprache

Jahr

Letzte Suchanfragen

Ergebnisse für *

User, who art thou? User profiling for oral corpus platforms

Accessing spoken language corpora: an overview of current approaches

Querying Interaction Structure: Approaches to Overlap in Spoken Language Corpora

Querying Repetitions in Spoken Language Corpora

KorAP: the new corpus analysis platform at IDS Mannheim

Robust corpus architecture: a new look at virtual collections and data access

The New IDS Corpus Analysis Platform: Challenges and Prospects

Corpus Query Lingua Franca (CQLF)

User, who art thou? User profiling for oral corpus platforms

Using Full Text Indices for Querying Spoken Language Data

Kontakt

Partner