Suchergebnisse

Das Gesamtkonzept des Deutschen Referenzkorpus DeReKo. Vom Design bis zur Verwendung und darüber hinaus

Autor*in: Kupietz, Marc ; Lüngen, Harald ; Diewald, Nils

Erschienen: 2023

Verlag: Berlin/Boston : de Gruyter ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS) [Zweitveröffentlichung]

Das Deutsche Referenzkorpus DeReKo dient als eine empirische Grundlage für die germanistische Linguistik. In diesem Beitrag geben wir einen Überblick über Grundlagen und Neuigkeiten zu DeReKo und seine Verwendungsmöglichkeiten sowie einen Einblick in... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11595 https://ids-pub.bsz-bw.de/files/11595/Kupietz_Luengen_Diewald_Das_Gesamtkonzept_des_DeReKo_2023.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-115951 https://doi.org/10.1515/9783111085708-002

Das Deutsche Referenzkorpus DeReKo dient als eine empirische Grundlage für die germanistische Linguistik. In diesem Beitrag geben wir einen Überblick über Grundlagen und Neuigkeiten zu DeReKo und seine Verwendungsmöglichkeiten sowie einen Einblick in seine strategische Gesamtkonzeption, die zum Ziel hat, DeReKo trotz begrenzter Ressourcen für einerseits möglichst viele und andererseits auch für innovative und anspruchsvolle Anwendungen nutzbar zu machen. Insbesondere erläutern wir dabei Strategien zur Aufbereitung sehr großer Korpora mit notwendigerweise heuristischen Verfahren und Herausforderungen, die sich auf dem Weg zur linguistischen Erschließung solcher Korpora stellen.

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Deutsch
Medientyp:	Aufsatz aus einem Sammelband
Format:	Online
DDC Klassifikation:	Germanische Sprachen; Deutsch (430)
Schlagworte:	Korpus; Empirische Linguistik; Germanistik; Datenaufbereitung; Sprachdaten; Heuristik; Forschungsdaten; Kontrastive Linguistik
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Recent developments in the European Reference Corpus EuReCo

Autor*in: Kupietz, Marc ; Diewald, Nils ; Trawiński, Beata ; Cosma, Ruxandra ; Cristea, Dan ; Tufiş, Dan ; Váradi, Tamás ; Wöllstein, Angelika

Erschienen: 2023

Verlag: Louvain-la-Neuve : Presses universitaires de Louvain ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

This paper reports on recent developments within the European Reference Corpus EuReCo, an open initiative that aims at providing and using virtual and dynamically definable comparable corpora based on existing national, reference or other large... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11829 https://ids-pub.bsz-bw.de/files/11829/Kupietz_Diewald_Recent_developments_in_EuReCo_2020.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-118291

This paper reports on recent developments within the European Reference Corpus EuReCo, an open initiative that aims at providing and using virtual and dynamically definable comparable corpora based on existing national, reference or other large corpora. Given the well-known shortcomings of other types of multilingual corpora such as parallel/translation corpora (shining-through effects, over-normalization, simplification, etc.) or web-based comparable corpora (covering only web material), EuReCo provides a unique linguistic resource offering new perspectives for fine-grained contrastive research on authentic cross-linguistic data, applications in translation studies and foreign language teaching and learning.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Korpus; Forschungsdaten; Sprachdaten; Kontrastive Linguistik; Übersetzungswissenschaft; Fremdsprachenunterricht; Fremdsprachenlernen
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

News from the International Comparable Corpus. First launch of ICC written

Autor*in: Kupietz, Marc ; Barbaresi, Adrien ; Čermáková, Anna ; Czachor, Małgorzata ; Diewald, Nils ; Ebeling, Jarle ; Górski, Rafał L. ; Margaretha, Eliza ; Kirk, John ; Křen, Michal ; Lüngen, Harald ; Oksefjell Ebeling, Signe ; Ó Meachair, Mícheál ; Pisetta, Ines ; Uí Dhonnchadha, Elaine ; Vogel, Friedemann ; Wilm, Rebecca ; Xu, Jiajin ; Yaddehige, Rameela

Erschienen: 2023

Verlag: Mannheim : IDS-Verlag; Leibniz-Institut für Deutsche Sprache (IDS)

The International Comparable Corpus (ICC) (Kirk/Čermáková 2017; Čermáková et al. 2021) is an open initiative which aims to improve the empirical basis for contrastive linguistics by compiling comparable corpora for many languages and making them as... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/12183 https://ids-pub.bsz-bw.de/files/12183/Kupietz_Barbaresi_Cermakova_ua_News_2023.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-121830 https://doi.org/10.14618/f8rt-m155

The International Comparable Corpus (ICC) (Kirk/Čermáková 2017; Čermáková et al. 2021) is an open initiative which aims to improve the empirical basis for contrastive linguistics by compiling comparable corpora for many languages and making them as freely available as possible as well as providing tools with which they can easily be queried and analysed. In this contribution we present the first release of written language parts of the ICC which includes corpora for Chinese, Czech, English, German, Irish (partly), and Norwegian. Each of the released corpora contains 400k words distributed over 14 different text categories according to the ICC specifications. Our poster covers the design basics of the ICC, its TEI encoding, a demonstration of using the ICC via different query tools, and an outlook on future plans. Similar to the European Reference Corpus EuReCo (Kupietz et al. 2020), ICC follows the approach of reusing existing linguistic resources wherever possible in order to cover as many languages as possible with realistic effort in as short a time as possible. In contrast to EuReCo, however, comparable corpus pairs are not defined dynamically in the usage phase, but the compositions of the corpora are fixed in the ICC design. The approaches are thus complementary in this respect. The design principles and composition of the ICC are based on those of the International Corpus of English (ICE) (Greenbaum (ed.) 1996), with the deviation that the ICC includes the additional text category blog post and excludes spoken legal texts (see Čermáková et al. 2021 for details). ICC’s fixed-design approach has the advantage that all single-language corpora in the ICC have the same composition with respect to the selected text types and that this guarantees that the selected broad spectrum of potential influencing variables for linguistic variation is always represented. The disadvantage, however, is that this can only be achieved for quite small corpora and that the generalisability of comparative findings based on the ...

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Aufsatz aus einem Sammelband
Format:	Online
DDC Klassifikation:	Sprache (400)
Lizenz:	creativecommons.org/licenses/by-sa/3.0/de/deed.de ; info:eu-repo/semantics/openAccess

Applying the newly extended European reference corpus EuReCo. Pilot studies of light-verb constructions in German, Romanian, Hungarian and Polish

Autor*in: Bański, Piotr ; Diewald, Nils ; Kupietz, Marc ; Trawiński, Beata

Erschienen: 2023

Verlag: Mannheim : IDS-Verlag

It is well known that the distribution of lexical and grammatical patterns is size- and register-sensitive (Biber 1986, and later publications). This fact alone presents a challenge to many corpus-oriented linguistic studies focusing on a single... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/12289 https://ids-pub.bsz-bw.de/files/12289/Banski_etal._Applying_the_newly_ext._europ._reference_2023.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-122898 https://doi.org/10.14618/f8rt-m155

It is well known that the distribution of lexical and grammatical patterns is size- and register-sensitive (Biber 1986, and later publications). This fact alone presents a challenge to many corpus-oriented linguistic studies focusing on a single language. When it comes to cross-linguistic studies using corpora, the challenge becomes even greater due to the lack of high-quality multilingual corpora (Kupietz et al. 2020; Kupietz/Trawiński 2022), which are comparable with respect to the size and the register. That was the motivation for the creation of the European Reference Corpus EuReCo, an initiative started in 2013 at the Leibniz Institute for the German Language (IDS) together with several European partners (Kupietz et al. 2020). EuReCo is an emerging federated corpus, with large virtual comparable corpora across various languages and with an infrastructure supporting contrastive research. The core of the infrastructure is KorAP (Diewald et al. 2016), a scalable open-source platform supporting the analysis and visualisation of properties of texts annotated by multiple and potentially conflicting information layers, and supporting several corpus query languages. Until recently, EuReCo consisted of three monolingual subparts: the German Reference Corpus DeReKo (Kupietz et al. 2018), the Reference Corpus of Contemporary Romanian Language (Barbu Mititelu/Tufiş/Irimia 2018), and the Hungarian National Corpus (Váradi 2002). The goal of the present submission is twofold. On the one hand, it reports about the new component of EuReCo: a sample of the National Corpus of Polish (Przepiórkowski et al. 2010). On the other hand, it presents the results of a new pilot study using the newly extended EuReCo. This pilot study investigates selected Polish collocations involving light verbs and their prepositional / nominal complements (Fig. 1) and extends the collocation analyses of German, Romanian and Hungarian (Fig. 2) discussed in Kupietz/Trawiński (2022).

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Aufsatz aus einem Sammelband
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Korpus; Kontrastive Linguistik
Lizenz:	creativecommons.org/licenses/by-sa/3.0/de/deed.de ; info:eu-repo/semantics/openAccess

Filtern nach

Aktive Filter

Kategorien:

Bereich

Quelle

Format

Beteiligt

Medientyp

Sprache

Jahr

Letzte Suchanfragen

Ergebnisse für *

Das Gesamtkonzept des Deutschen Referenzkorpus DeReKo. Vom Design bis zur Verwendung und darüber hinaus

Recent developments in the European Reference Corpus EuReCo

News from the International Comparable Corpus. First launch of ICC written

Applying the newly extended European reference corpus EuReCo. Pilot studies of light-verb constructions in German, Romanian, Hungarian and Polish

Kontakt

Partner