Suchergebnisse

Datensatz Genitiv- und von-Attribute

Autor*in: Kopf, Kristin ; Bildhauer, Felix

Erschienen: 2021

Bibliographische Angaben
Zugang

Volltext:	https://d-nb.info/1246269880/34 https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/10723
Zitierfähiger Link:	https://doi.org/10.14618/genitivvondb https://nbn-resolving.org/urn:nbn:de:bsz:mh39-107238

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Deutsch
Medientyp:	Unbestimmt
Format:	Online
DDC Klassifikation:	Deutsche Grammatik (435)
Lizenz:	kostenfrei

Multiple fronting vs. VP fronting in German

Autor*in: Bildhauer, Felix ; Cook, Philippa

Erschienen: 2011

Bibliographische Angaben
Zugang

Volltext:	https://d-nb.info/1206794461/34 http://edoc.hu-berlin.de/18452/2018
Zitierfähiger Link:	https://doi.org/10.18452/1366 https://nbn-resolving.org/urn:nbn:de:kobv:11-100186372

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Unbestimmt
Format:	Online
DDC Klassifikation:	Germanische Sprachen; Deutsch (430)
Lizenz:	kostenfrei

Datensatz Genitiv- und von-Attribute

Autor*in: Kopf, Kristin ; Bildhauer, Felix

Erschienen: 2021

Verlag: Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

Der Datensatz enthält 16.604 Korpusbelege aus Nominalphrasen mit Genitiv- und von-Attributen (die Ideen zahlreicher Kinder, die Ideen von zahlreichen Kindern), wobei die Genitivattribute prä- oder postnominal erscheinen können (Mannheims... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/10723 https://ids-pub.bsz-bw.de/files/10723/Kopf_Bildhauer_Datensatz_Genitiv_und_von_Attribute_2021.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-107238 https://doi.org/10.14618/genitivvondb

Der Datensatz enthält 16.604 Korpusbelege aus Nominalphrasen mit Genitiv- und von-Attributen (die Ideen zahlreicher Kinder, die Ideen von zahlreichen Kindern), wobei die Genitivattribute prä- oder postnominal erscheinen können (Mannheims Sehenswürdigkeiten, die Sehenswürdigkeiten Mannheims). Für jeden Beleg sind Informationen zu Land, Dekade und Medium enthalten. Hinzu kommen Angaben zu Kopf- und/oder Attributslemma (z. B. Namentyp, Flexionsklasse), Gesamtphrase (z. B. Definitheit, Kasus) und Attributsphrase (z. B. Kasusdistinktion, Länge). Zahlreiche Sonderfälle sind ebenfalls annotiert (z. B. Genitiv bei nichtflektiertem Adjektiv wie Gebäck Mannheimer Bäckereien, Phrasen mit adjektivisch flektierendem Attributsnomen wie die Ideen Jugendlicher, die Ideen von Jugendlichen).

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Deutsch
Medientyp:	Unbestimmt
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Grammis; Grammatik; Datensatz; Genitivattribut; Korpus; Nominalphrase
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Datensatz attributive dass-Sätze und zu-Infinitive

Autor*in: Bildhauer, Felix ; Weber, Thilo

Erschienen: 2023

Verlag: Mannheim : Leibniz-Institut für Deutsche Sprache

Der Datensatz enthält 10.113 Korpusbelege für Konstruktionen, in denen ein Substantiv mit einem dass-Satz oder einem zu-Infinitiv auftritt (das Versprechen, dass man sich irgendwann wiedersieht vs. das Versprechen, sich irgendwann wiederzusehen). Die... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11610 https://ids-pub.bsz-bw.de/files/11610/Bildhauer_Weber_Datensatz_attributive_dass_Saetze_2023.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-116107 https://doi.org/10.14618/attributsatzdb

Der Datensatz enthält 10.113 Korpusbelege für Konstruktionen, in denen ein Substantiv mit einem dass-Satz oder einem zu-Infinitiv auftritt (das Versprechen, dass man sich irgendwann wiedersieht vs. das Versprechen, sich irgendwann wiederzusehen). Die Daten wurden erhoben aus: 1. dem Korpusgrammatik-Untersuchungskorpus (Bubenhofer et al. 2014), basierend auf dem Deutschen Referenzkorpus DeReKo (Kupietz et al. 2010, 2018), Release 2017-II. 2. dem Subkorpus “Forum” des DECOW16B-Webkorpus (Schäfer & Bildhauer 2012).

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Deutsch
Medientyp:	Unbestimmt
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Datensatz; Korpus; Grammatik; Grammis
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Web corpus construction

Autor*in: Schäfer, Roland

Verlag: Morgan & Claypool Publishers, [San Rafael, California]

Berlin: Humboldt-Universität zu Berlin, Universitätsbibliothek, Jacob-und-Wilhelm-Grimm-Zentrum

Standort:

Humboldt-Universität zu Berlin, Universitätsbibliothek, Jacob-und-Wilhelm-Grimm-Zentrum

Fernleihe:

uneingeschränkte Fernleihe, Kopie und Ausleihe

Link zum Verbundkatalog:

Kooperativer Bibliotheksverbund Berlin-Brandenburg (KOBV)

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Hinweise zum Inhalt

Volltext (URL des Erstveröffentlichers)

Quelle:	Verbundkataloge
Beteiligt:	Bildhauer, Felix (Verfasser)
Sprache:	Englisch
Medientyp:	Ebook
Format:	Online
ISBN:	9781608459841
Weitere Identifier:	doi: 10.2200/S00508ED1V01Y201305HLT022
RVK Klassifikation:	ES 900
Schriftenreihe:	Synthesis lectures on human language technologies ; #22
Umfang:	1 Online-Ressource (xv, 129 Seiten), Diagramme

Web corpus construction

Autor*in: Schäfer, Roland

Erschienen: 2013

Verlag: Morgan & Claypool, [San Rafael, Calif.]

Berlin: Freie Universität Berlin, Universitätsbibliothek

Standort:

Freie Universität Berlin, Universitätsbibliothek

Fernleihe:

uneingeschränkte Fernleihe, Kopie und Ausleihe

Link zum Verbundkatalog:

Kooperativer Bibliotheksverbund Berlin-Brandenburg (KOBV)

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Hinweise zum Inhalt

Inhaltsverzeichnis

Quelle:	Verbundkataloge
Beteiligt:	Bildhauer, Felix (Verfasser)
Sprache:	Englisch
Medientyp:	Buch (Monographie)
ISBN:	9781608459834
RVK Klassifikation:	ES 900
Schriftenreihe:	Synthesis lectures on human language technologies ; 22
Umfang:	129 S., graph. Darst.

Multiple fronting vs. VP fronting in German

Autor*in: Bildhauer, Felix

Erschienen: 2011

Verlag: Humboldt-Universität zu Berlin, Berlin

Zugang:

Resolving-System

Langzeitarchivierung Nationalbibliothek

Verlag (kostenfrei)

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Quelle:	Verbundkataloge
Beteiligt:	Cook, Philippa (Verfasser)
Sprache:	Englisch
Medientyp:	Buch (Monographie)
Format:	Online
Weitere Identifier:	doi: 10.18452/1366 urn: urn:nbn:de:kobv:11-100186372
Umfang:	Online-Ressource

Mehrfache Vorfeldbesetzung und Informationsstruktur: Eine Bestandsaufnahme

Autor*in: Bildhauer, Felix

Zugang:

Resolving-System

Langzeitarchivierung Nationalbibliothek

Verlag (kostenfrei)

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Quelle:	Verbundkataloge
Sprache:	Deutsch
Medientyp:	Aufsatz aus einer Zeitschrift
Format:	Online
Weitere Identifier:	urn: urn:nbn:de:bsz:mh39-41735
Übergeordneter Titel:	In:: Deutsche Sprache; Berlin : E. @Schmidt, 2003-; Online-Ressource
DDC Klassifikation:	Germanische Sprachen; Deutsch (430)
Schlagworte:	Deutsch; Syntax; Korpus <Linguistik>; Topikalisierung
Umfang:	Online-Ressource

Web Corpus Construction

Autor*in: Schäfer, Roland

Verlag: Morgan & Claypool Publishers, [San Rafael]

The World Wide Web constitutes the largest existing source of texts written in a great variety of languages. A feasible and sound way of exploiting this data for linguistic research is to compile a static corpus for a given language. There are... mehr

Potsdam: Universität Potsdam, Universitätsbibliothek

Standort:

Universität Potsdam, Universitätsbibliothek

Fernleihe:

uneingeschränkte Fernleihe, Kopie und Ausleihe

Link zum Verbundkatalog:

Gemeinsamer Bibliotheksverbund (GBV)

The World Wide Web constitutes the largest existing source of texts written in a great variety of languages. A feasible and sound way of exploiting this data for linguistic research is to compile a static corpus for a given language. There are several advantages of this approach: (i) Working with such corpora obviates the problems encountered when using Internet search engines in quantitative linguistic research (such as non-transparent ranking algorithms). (ii) Creating a corpus from web data is virtually free. (iii) The size of corpora compiled from the WWW may exceed by several orders of magnitudes the size of language resources offered elsewhere. (iv) The data is locally available to the user, and it can be linguistically post-processed and queried with the tools preferred by her/him. This book addresses the main practical tasks in the creation of web corpora up to giga-token size. Among these tasks are the sampling process (i. e., web crawling) and the usual cleanups including boilerplate removal and removal of duplicated content. Linguistic processing and problems with linguistic processing coming from the different kinds of noise in web corpora are also covered. Finally, the authors show how web corpora can be evaluated and compared to other corpora (such as traditionally compiled corpora) 1. Web corpora -- 2. Data collection -- 2.1 Introduction -- 2.2 The structure of the web -- 2.2.1 General properties -- 2.2.2 Accessibility and stability of web pages -- 2.2.3 What's in a (national) top level domain? -- 2.2.4 Problematic segments of the web -- 2.3 Crawling basics -- 2.3.1 Introduction -- 2.3.2 Corpus construction from search engine results -- 2.3.3 Crawlers and crawler performance -- 2.3.4 Configuration details and politeness -- 2.3.5 Seed URL generation -- 2.4 More on crawling strategies -- 2.4.1 Introduction -- 2.4.2 Biases and the pagerank -- 2.4.3 Focused crawling -- 3. Post-processing -- 3.1 Introduction -- 3.2 Basic cleanups -- 3.2.1 HTML stripping -- 3.2.2 Character references and entities -- 3.2.3 Character sets and conversion -- 3.2.4 Further normalization -- 3.3 Boilerplate removal -- 3.3.1 Introduction to boilerplate -- 3.3.2 Feature extraction -- 3.3.3 Choice of the machine learning method -- 3.4 Language identification -- 3.5 Duplicate detection -- 3.5.1 Types of duplication -- 3.5.2 Perfect duplicates and hashing -- 3.5.3 Near duplicates, Jaccard coefficients, and shingling -- 4. Linguistic processing -- 4.1 Introduction -- 4.2 Basics of tokenization, part-of-speech tagging, and lemmatization -- 4.2.1 Tokenization -- 4.2.2 Part-of-speech tagging -- 4.2.3 Lemmatization -- 4.3 Linguistic post-processing of noisy data -- 4.3.1 Introduction -- 4.3.2 Treatment of noisy data -- 4.4 Tokenizing web texts -- 4.4.1 Example: missing whitespace -- 4.4.2 Example: emoticons -- 4.5 POS tagging and lemmatization of web texts -- 4.5.1 Tracing back errors in POS tagging -- 4.6 Orthographic normalization -- 4.7 Software for linguistic post-processing -- 5. Corpus evaluation and comparison -- 5.1 Introduction -- 5.2 Rough quality check -- 5.2.1 Word and sentence lengths -- 5.2.2 Duplication -- 5.3 Measuring corpus similarity -- 5.3.1 Inspecting frequency lists -- 5.3.2 Hypothesis testing with -- 5.3.3 Hypothesis testing with Spearman's rank correlation -- 5.3.4 Using test statistics without hypothesis testing -- 5.4 Comparing keywords -- 5.4.1 Keyword extraction with x2 -- 5.4.2 Keyword extraction using the ratio of relative frequencies -- 5.4.3 Variants and refinements -- 5.5 Extrinsic evaluation -- 5.6 Corpus composition -- 5.6.1 Estimating corpus composition -- 5.6.2 Measuring corpus composition -- 5.6.3 Interpreting corpus composition -- 5.7 Summary -- Bibliography -- Authors' biographies

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Hinweise zum Inhalt

Volltext

Quelle:	Verbundkataloge
Beteiligt:	Bildhauer, Felix (VerfasserIn)
Sprache:	Englisch
Medientyp:	Ebook
Format:	Online
ISBN:	9781608459841
Weitere Identifier:	doi: 10.2200/S00508ED1V01Y201305HLT022
RVK Klassifikation:	ES 900
Schriftenreihe:	Synthesis Lectures on Human Language Technologies ; #22
Schlagworte:	Web search engines; Computational linguistics; Corpora (Linguistics)
Umfang:	1 Online-Ressource (222 Seiten), Illustrationen
Bemerkung(en):	Description based upon print version of record Also available in print. : : : : : :

Web corpus construction

Autor*in: Schäfer, Roland

Verlag: Morgan & Claypool Publishers, [San Rafael, Calif.]

Bremen: Staats- und Universitätsbibliothek Bremen

Standort:

Staats- und Universitätsbibliothek Bremen

Signatur:

a asl 117.9/566

Fernleihe:

uneingeschränkte Fernleihe, Kopie und Ausleihe

Link zum Verbundkatalog:

Gemeinsamer Bibliotheksverbund (GBV)

Hannover: Technische Informationsbibliothek (TIB) / Leibniz-Informationszentrum Technik und Naturwissenschaften und Universitätsbibliothek

Standort:

Technische Informationsbibliothek (TIB) / Leibniz-Informationszentrum Technik und Naturwissenschaften und Universitätsbibliothek

Signatur:

CK/640/1165

Fernleihe:

uneingeschränkte Fernleihe, Kopie und Ausleihe

Link zum Verbundkatalog:

Gemeinsamer Bibliotheksverbund (GBV)

Hildesheim: Universitätsbibliothek Hildesheim

Standort:

Universitätsbibliothek Hildesheim

Signatur:

CSC 727 : S11

Fernleihe:

uneingeschränkte Fernleihe, Kopie und Ausleihe

Link zum Verbundkatalog:

Gemeinsamer Bibliotheksverbund (GBV)

Lüneburg: Leuphana Universität Lüneburg, Medien- und Informationszentrum, Universitätsbibliothek

Standort:

Leuphana Universität Lüneburg, Medien- und Informationszentrum, Universitätsbibliothek

Signatur:

Ling 138.028

Fernleihe:

uneingeschränkte Fernleihe, Kopie und Ausleihe

Link zum Verbundkatalog:

Gemeinsamer Bibliotheksverbund (GBV)

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Quelle:	Verbundkataloge
Beteiligt:	Bildhauer, Felix (VerfasserIn)
Sprache:	Englisch
Medientyp:	Buch (Monographie)
Format:	Druck
ISBN:	9781608459834
RVK Klassifikation:	ES 900
Schriftenreihe:	Synthesis lectures on human language technologies ; 22
Schlagworte:	Korpus <Linguistik>; Internet;
Umfang:	XV, 129 Seiten
Bemerkung(en):	Bibliography p 111 -128

Web Corpus Construction

Autor*in: Schäfer, Roland

Verlag: Morgan & Claypool Publishers, [San Rafael]

The World Wide Web constitutes the largest existing source of texts written in a great variety of languages. A feasible and sound way of exploiting this data for linguistic research is to compile a static corpus for a given language. There are... mehr

Potsdam: Universität Potsdam, Universitätsbibliothek

Standort:

Universität Potsdam, Universitätsbibliothek

Fernleihe:

keine Fernleihe

Link zum Verbundkatalog:

Gemeinsamer Bibliotheksverbund (GBV)

The World Wide Web constitutes the largest existing source of texts written in a great variety of languages. A feasible and sound way of exploiting this data for linguistic research is to compile a static corpus for a given language. There are several advantages of this approach: (i) Working with such corpora obviates the problems encountered when using Internet search engines in quantitative linguistic research (such as non-transparent ranking algorithms). (ii) Creating a corpus from web data is virtually free. (iii) The size of corpora compiled from the WWW may exceed by several orders of magnitudes the size of language resources offered elsewhere. (iv) The data is locally available to the user, and it can be linguistically post-processed and queried with the tools preferred by her/him. This book addresses the main practical tasks in the creation of web corpora up to giga-token size. Among these tasks are the sampling process (i. e., web crawling) and the usual cleanups including boilerplate removal and removal of duplicated content. Linguistic processing and problems with linguistic processing coming from the different kinds of noise in web corpora are also covered. Finally, the authors show how web corpora can be evaluated and compared to other corpora (such as traditionally compiled corpora) 1. Web corpora -- 2. Data collection -- 2.1 Introduction -- 2.2 The structure of the web -- 2.2.1 General properties -- 2.2.2 Accessibility and stability of web pages -- 2.2.3 What's in a (national) top level domain? -- 2.2.4 Problematic segments of the web -- 2.3 Crawling basics -- 2.3.1 Introduction -- 2.3.2 Corpus construction from search engine results -- 2.3.3 Crawlers and crawler performance -- 2.3.4 Configuration details and politeness -- 2.3.5 Seed URL generation -- 2.4 More on crawling strategies -- 2.4.1 Introduction -- 2.4.2 Biases and the pagerank -- 2.4.3 Focused crawling -- 3. Post-processing -- 3.1 Introduction -- 3.2 Basic cleanups -- 3.2.1 HTML stripping -- 3.2.2 Character references and entities -- 3.2.3 Character sets and conversion -- 3.2.4 Further normalization -- 3.3 Boilerplate removal -- 3.3.1 Introduction to boilerplate -- 3.3.2 Feature extraction -- 3.3.3 Choice of the machine learning method -- 3.4 Language identification -- 3.5 Duplicate detection -- 3.5.1 Types of duplication -- 3.5.2 Perfect duplicates and hashing -- 3.5.3 Near duplicates, Jaccard coefficients, and shingling -- 4. Linguistic processing -- 4.1 Introduction -- 4.2 Basics of tokenization, part-of-speech tagging, and lemmatization -- 4.2.1 Tokenization -- 4.2.2 Part-of-speech tagging -- 4.2.3 Lemmatization -- 4.3 Linguistic post-processing of noisy data -- 4.3.1 Introduction -- 4.3.2 Treatment of noisy data -- 4.4 Tokenizing web texts -- 4.4.1 Example: missing whitespace -- 4.4.2 Example: emoticons -- 4.5 POS tagging and lemmatization of web texts -- 4.5.1 Tracing back errors in POS tagging -- 4.6 Orthographic normalization -- 4.7 Software for linguistic post-processing -- 5. Corpus evaluation and comparison -- 5.1 Introduction -- 5.2 Rough quality check -- 5.2.1 Word and sentence lengths -- 5.2.2 Duplication -- 5.3 Measuring corpus similarity -- 5.3.1 Inspecting frequency lists -- 5.3.2 Hypothesis testing with -- 5.3.3 Hypothesis testing with Spearman's rank correlation -- 5.3.4 Using test statistics without hypothesis testing -- 5.4 Comparing keywords -- 5.4.1 Keyword extraction with x2 -- 5.4.2 Keyword extraction using the ratio of relative frequencies -- 5.4.3 Variants and refinements -- 5.5 Extrinsic evaluation -- 5.6 Corpus composition -- 5.6.1 Estimating corpus composition -- 5.6.2 Measuring corpus composition -- 5.6.3 Interpreting corpus composition -- 5.7 Summary -- Bibliography -- Authors' biographies

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Hinweise zum Inhalt

Volltext

Quelle:	Verbundkataloge
Beteiligt:	Bildhauer, Felix (VerfasserIn)
Sprache:	Englisch
Medientyp:	Ebook
Format:	Online
ISBN:	9781608459841
Weitere Identifier:	doi: 10.2200/S00508ED1V01Y201305HLT022
RVK Klassifikation:	ES 900
Schriftenreihe:	Synthesis Lectures on Human Language Technologies ; #22
Schlagworte:	Web search engines; Computational linguistics; Corpora (Linguistics)
Umfang:	1 Online-Ressource (222 Seiten), Illustrationen
Bemerkung(en):	Description based upon print version of record Also available in print. : : : : : :

Mehrfache Vorfeldbesetzung und Informationsstruktur

Eine Bestandsaufnahme

Autor*in: Bildhauer, Felix

Erschienen: 2011

Mannheim: Leibniz-Institut für Deutsche Sprache (IDS), Bibliothek

Standort:

Leibniz-Institut für Deutsche Sprache (IDS), Bibliothek

Fernleihe:

keine Fernleihe

Link zum Verbundkatalog:

Südwestdeutscher Bibliotheksverbund (SWB)

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Quelle:	Leibniz-Institut für Deutsche Sprache, Bibliothek
Sprache:	Deutsch
Medientyp:	Aufsatz aus einer Zeitschrift
Format:	Druck
Übergeordneter Titel:	In: Deutsche Sprache; Berlin : E. Schmidt, 1973; 39(2011), 4, Seite 362-379

Starke und schwache Adjektivflexion in neuem korpuslinguistischen Licht

Autor*in: Bildhauer, Felix

Bremen: Staats- und Universitätsbibliothek Bremen

Standort:

Staats- und Universitätsbibliothek Bremen

Link zum Verbundkatalog:

Gemeinsamer Bibliotheksverbund (GBV)

Göttingen: Niedersächsische Staats- und Universitätsbibliothek Göttingen

Standort:

Niedersächsische Staats- und Universitätsbibliothek Göttingen

Fernleihe:

keine Fernleihe

Link zum Verbundkatalog:

Gemeinsamer Bibliotheksverbund (GBV)

Kiel: Universitätsbibliothek Kiel, Zentralbibliothek

Standort:

Universitätsbibliothek Kiel, Zentralbibliothek

Link zum Verbundkatalog:

Gemeinsamer Bibliotheksverbund (GBV)

Mannheim: Leibniz-Institut für Deutsche Sprache (IDS), Bibliothek

Standort:

Leibniz-Institut für Deutsche Sprache (IDS), Bibliothek

Fernleihe:

keine Fernleihe

Link zum Verbundkatalog:

Südwestdeutscher Bibliotheksverbund (SWB)

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Quelle:	Leibniz-Institut für Deutsche Sprache, Bibliothek
Beteiligt:	Fuß, Eric (VerfasserIn); Hansen-Morath, Sandra (VerfasserIn); Münzberg, Franziska (VerfasserIn)
Sprache:	Deutsch
Medientyp:	Aufsatz aus einem Sammelband
Format:	Druck
Übergeordneter Titel:	Enthalten in: Institut für Deutsche Sprache (54. : 2018 : Mannheim); Neues vom heutigen Deutsch; Berlin : De Gruyter, 2019; (2019), Seite [293]-312; XX, 364 Seiten

Alternation von zu- und dass-Komplementen

Kontrolle, Korpus und Grammatik

Autor*in: Brandt, Patrick

Göttingen: Niedersächsische Staats- und Universitätsbibliothek Göttingen

Standort:

Niedersächsische Staats- und Universitätsbibliothek Göttingen

Link zum Verbundkatalog:

Gemeinsamer Bibliotheksverbund (GBV)

Mannheim: Leibniz-Institut für Deutsche Sprache (IDS), Bibliothek

Standort:

Leibniz-Institut für Deutsche Sprache (IDS), Bibliothek

Fernleihe:

keine Fernleihe

Link zum Verbundkatalog:

Südwestdeutscher Bibliotheksverbund (SWB)

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Quelle:	Leibniz-Institut für Deutsche Sprache, Bibliothek
Beteiligt:	Bildhauer, Felix (MitwirkendeR)
Sprache:	Deutsch
Medientyp:	Aufsatz aus einem Sammelband
Format:	Druck
Übergeordneter Titel:	Enthalten in: Grammatik im Korpus; Tübingen : Narr Francke Attempto, 2019; (2019), Seite [211]-297; 357 Seiten

Web corpus construction

Autor*in: Schäfer, Roland

Erschienen: 2013

Verlag: Morgan & Claypool, [San Rafael, Calif.]

Freiburg/Breisgau: Universitätsbibliothek Freiburg

Standort:

Universität Freiburg, Romanisches Seminar, Bibliothek

Signatur:

Frei 23: S 7 SCHÄ/1

Fernleihe:

keine Ausleihe von Bänden, nur Papierkopien werden versandt

Link zum Verbundkatalog:

Südwestdeutscher Bibliotheksverbund (SWB)

Karlsruhe: Karlsruher Institut für Technologie, KIT-Bibliothek

Standort:

KIT-Bibliothek, Bibliothek der Fakultät für Informatik

Signatur:

2014 272

Fernleihe:

uneingeschränkte Fernleihe, Kopie und Ausleihe

Link zum Verbundkatalog:

Südwestdeutscher Bibliotheksverbund (SWB)

Mannheim: Leibniz-Institut für Deutsche Sprache (IDS), Bibliothek

Standort:

Leibniz-Institut für Deutsche Sprache (IDS), Bibliothek

Fernleihe:

keine Fernleihe

Link zum Verbundkatalog:

Südwestdeutscher Bibliotheksverbund (SWB)

Saarbrücken: Saarländische Universitäts- und Landesbibliothek

Standort:

Universität des Saarlandes, Campusbibliothek für Informatik und Mathematik, Fachrichtung Sprachwissenschaft und Sprachtechnologie, Bibliothek Computerlinguistik, Phonetik und Sprachtechnologie

Signatur:

LIN CORP 6122

Fernleihe:

keine Ausleihe von Bänden, nur Papierkopien werden versandt

Link zum Verbundkatalog:

Südwestdeutscher Bibliotheksverbund (SWB)

Tübingen: Universitätsbibliothek der Eberhard Karls Universität

Standort:

Brechtbau-Bibliothek

Signatur:

GD 900.454-22

Fernleihe:

uneingeschränkte Fernleihe, Kopie und Ausleihe

Link zum Verbundkatalog:

Südwestdeutscher Bibliotheksverbund (SWB)

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Quelle:	Leibniz-Institut für Deutsche Sprache, Bibliothek
Beteiligt:	Bildhauer, Felix
Sprache:	Englisch
Medientyp:	Buch (Monographie)
Format:	Druck
ISBN:	9781608459834
Schriftenreihe:	Synthesis lectures on human language technologies ; 22
Schlagworte:	Korpus <Linguistik>; Internet;
Umfang:	XV, 129 S., graph. Darst.

Web corpus construction

Autor*in: Schäfer, Roland

Erschienen: 2013

Verlag: Morgan & Claypool, [San Rafael, Calif.]

München: Universitätsbibliothek der LMU München

Standort:

Universitätsbibliothek der LMU München

Fernleihe:

uneingeschränkte Fernleihe, Kopie und Ausleihe

Link zum Verbundkatalog:

Bibliotheksverbund Bayern (BVB)

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Hinweise zum Inhalt

Inhaltsverzeichnis

Quelle:	Verbundkataloge
Beteiligt:	Bildhauer, Felix (Verfasser)
Sprache:	Englisch
Medientyp:	Buch (Monographie)
ISBN:	9781608459834
RVK Klassifikation:	ES 900
Schriftenreihe:	Synthesis lectures on human language technologies ; 22
Umfang:	129 S., graph. Darst.

Web corpus construction

Autor*in: Schäfer, Roland

Verlag: Morgan & Claypool Publishers, [San Rafael, California]

Erlangen: Universitätsbibliothek Erlangen-Nürnberg, Hauptbibliothek

Standort:

Universitätsbibliothek Erlangen-Nürnberg, Hauptbibliothek

Fernleihe:

uneingeschränkte Fernleihe, Kopie und Ausleihe

Link zum Verbundkatalog:

Bibliotheksverbund Bayern (BVB)

Regensburg: Universitätsbibliothek Regensburg

Standort:

Universitätsbibliothek Regensburg

Fernleihe:

uneingeschränkte Fernleihe, Kopie und Ausleihe

Link zum Verbundkatalog:

Bibliotheksverbund Bayern (BVB)

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Hinweise zum Inhalt

Volltext (URL des Erstveröffentlichers)

Quelle:	Verbundkataloge
Beteiligt:	Bildhauer, Felix (Verfasser)
Sprache:	Englisch
Medientyp:	Ebook
Format:	Online
ISBN:	9781608459841
Weitere Identifier:	doi: 10.2200/S00508ED1V01Y201305HLT022
RVK Klassifikation:	ES 900
Schriftenreihe:	Synthesis lectures on human language technologies ; #22
Umfang:	1 Online-Ressource (xv, 129 Seiten), Diagramme

Einleitung

Autor*in: Konopka, Marek

Verlag: University Publishing, Heidelberg

Mannheim: Leibniz-Institut für Deutsche Sprache (IDS), Bibliothek

Standort:

Leibniz-Institut für Deutsche Sprache (IDS), Bibliothek

Fernleihe:

keine Fernleihe

Link zum Verbundkatalog:

Südwestdeutscher Bibliotheksverbund (SWB)

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Quelle:	Leibniz-Institut für Deutsche Sprache, Bibliothek
Beteiligt:	Brandt, Patrick (VerfasserIn); Münzberg, Franziska (VerfasserIn); Hansen-Morath, Sandra (VerfasserIn); Bildhauer, Felix (VerfasserIn)
Sprache:	Deutsch
Medientyp:	Buch (Monographie)
Format:	Druck
ISBN:	9783968220321
Schriftenreihe:	Bausteine einer Korpusgrammatik des Deutschen ; Band 1
Schlagworte:	Deutsch; Grammatik; Open Data; Statistische Analyse; Korpus <Linguistik>
Umfang:	157 Seiten

Einleitung

Autor*in: Konopka, Marek

Verlag: University Publishing, Heidelberg

Zugang:

Verlag (Kostenfrei)

Bremen: Staats- und Universitätsbibliothek Bremen

Standort:

Staats- und Universitätsbibliothek Bremen

Fernleihe:

keine Fernleihe

Link zum Verbundkatalog:

Gemeinsamer Bibliotheksverbund (GBV)

Hamburg: Staats- und Universitätsbibliothek Hamburg Carl von Ossietzky

Standort:

Staats- und Universitätsbibliothek Hamburg Carl von Ossietzky

Fernleihe:

keine Fernleihe

Link zum Verbundkatalog:

Gemeinsamer Bibliotheksverbund (GBV)

Mannheim: Leibniz-Institut für Deutsche Sprache (IDS), Bibliothek

Standort:

Leibniz-Institut für Deutsche Sprache (IDS), Bibliothek

Fernleihe:

keine Fernleihe

Link zum Verbundkatalog:

Südwestdeutscher Bibliotheksverbund (SWB)

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Quelle:	Leibniz-Institut für Deutsche Sprache, Bibliothek
Beteiligt:	Brandt, Patrick (VerfasserIn); Münzberg, Franziska (VerfasserIn); Hansen, Sandra (VerfasserIn); Bildhauer, Felix (VerfasserIn)
Sprache:	Deutsch
Medientyp:	Ebook
Format:	Online
ISBN:	9783968220338
Schriftenreihe:	Bausteine einer Korpusgrammatik des Deutschen ; Band 1
Schlagworte:	Deutsch; Grammatik; Open Data; Statistische Analyse; Korpus <Linguistik>
Umfang:	1 Online-Ressource

Data point selection for genre-aware parsing

Autor*in: Rehbein, Ines ; Bildhauer, Felix

Erschienen: 2018

Verlag: Prague : Charles University

In the NLP literature, adapting a parser to new text with properties different from the training data is commonly referred to as domain adaptation. In practice, however, the differences between texts from different sources often reflect a mixture of... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/7119 https://ids-pub.bsz-bw.de/files/7119/Rehbein_Bildhauer_Data_point_selection_for_genre_aware_parsing_2017.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-71193

In the NLP literature, adapting a parser to new text with properties different from the training data is commonly referred to as domain adaptation. In practice, however, the differences between texts from different sources often reflect a mixture of domain and genre properties, and it is by no means clear what impact each of those has on statistical parsing. In this paper, we investigate how differences between articles in a newspaper corpus relate to the concepts of genre and domain and how they influence parsing performance of a transition-based dependency parser. We do this by applying various similarity measures for data point selection and testing their adequacy for creating genre-aware parsing models.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Syntaktische Analyse; Automatische Sprachanalyse; Textsorte; Korpus; Sprachstatistik
Lizenz:	creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess

Data point selection for genre-aware parsing

Autor*in: Rehbein, Ines ; Bildhauer, Felix

Erschienen: 2018

Verlag: Stroudsburg PA, USA : The Association for Computational Linguistics

In the NLP literature, adapting a parser to new text with properties different from the training data is commonly referred to as domain adaptation. In practice, however, the differences between texts from different sources often reflect a mixture of... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/8000 https://ids-pub.bsz-bw.de/files/8000/Rehbein_Bildhauer_Data_point_selection_2017.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-80007

In the NLP literature, adapting a parser to new text with properties different from the training data is commonly referred to as domain adaptation. In practice, however, the differences between texts from different sources often reflect a mixture of domain and genre properties, and it is by no means clear what impact each of those has on statistical parsing. In this paper, we investigate how differences between articles in a newspaper corpus relate to the concepts of genre and domain and how they influence parsing performance of a transition-based dependency parser. We do this by applying various similarity measures for data point selection and testing their adequacy for creating genre-aware parsing models.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Parsing; Korpus; Textsorte
Lizenz:	creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess

Starke und schwache Adjektivflexion in neuem korpuslinguistischen Licht

Autor*in: Bildhauer, Felix ; Fuß, Eric ; Hansen-Morath, Sandra ; Münzberg, Franziska

Erschienen: 2019

Verlag: Berlin [u.a.] : de Gruyter

In Adjektivreihungen ohne Determinierer ('in neuem korpuslinguistisch-em/-en Licht') und in Fügungen aus Pronominaladjektiv und attributivem Adjektiv ('mancher ausbildend-er/-e Betrieb') treten Schwankungen zwischen Parallel- und Wechselflexion auf,... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/8515 https://ids-pub.bsz-bw.de/files/8515/Bildhauer_Fuss_u_a_Starke_und_schwache_Adjektivflexion_2019.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-85159 https://doi.org/10.1515/9783110622591-014

In Adjektivreihungen ohne Determinierer ('in neuem korpuslinguistisch-em/-en Licht') und in Fügungen aus Pronominaladjektiv und attributivem Adjektiv ('mancher ausbildend-er/-e Betrieb') treten Schwankungen zwischen Parallel- und Wechselflexion auf, die von einem komplexen Zusammenspiel verschiedener grammatischer und außergrammatischer Faktoren beeinflusst werden. Auf der Basis einer explorativen Korpusstudie werden im vorliegenden Beitrag zunächst einschlägige Einflussgrößen identifiziert und deren Effektstärken geschätzt. Im Anschluss wird gezeigt, dass entgegen bisherigen Annahmen nach Pronominaladjektiven keine allgemeine Tendenz zur schwachen Flexion vorliegt, sondern mit Ausnahme des Kontextes Dat. Sg. Mask./Neutr. diachron eine Ausbreitung der Parallelflexion (stark/stark) beobachtbar ist.

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Deutsch
Medientyp:	Aufsatz aus einem Sammelband
Format:	Online
DDC Klassifikation:	Germanische Sprachen; Deutsch (430)
Schlagworte:	Flexion; Adjektiv; Korpus; Deutsch; Grammatik
Lizenz:	creativecommons.org/licenses/by-nc-nd/4.0/deed.de ; info:eu-repo/semantics/openAccess

Fugenelemente im Korpus: Regelhaftigkeit und Variation

Autor*in: Hansen, Sandra ; Bildhauer, Felix ; Konopka, Marek

Erschienen: 2022

Verlag: Paderborn : Wilhelm Fink ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

Gegenstand des Beitrags sind korpuslinguistische Zugänge zur Variation im Auftreten des Fugenelements in Komposita aus zwei Nomen (Arbeit I s I weg). Die qualitative Vorstudie zeigt, dass die Verfügung nach Erstglied auf Vokal (Bühne I n I spiel, See... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11064 https://ids-pub.bsz-bw.de/files/11064/Hansen_Bildhauer_Konopka_Fugenelemente_im_Korpus_2022.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-110641

Gegenstand des Beitrags sind korpuslinguistische Zugänge zur Variation im Auftreten des Fugenelements in Komposita aus zwei Nomen (Arbeit I s I weg). Die qualitative Vorstudie zeigt, dass die Verfügung nach Erstglied auf Vokal (Bühne I n I spiel, See I ufer) entgegen manchen Hinweisen aus bisherigen Korpusuntersuchungen sehr weitgehend linguistisch systematisierbar ist. Die Hauptstudie fokussiert dann die sehr variable Verfügung nach Erstglied auf Konsonant (Arbeit I s I weg vs. Heimat I art). Sie modelliert statistisch den Einfluss von Größen, deren Bedeutung in der bisherigen Forschung nur angenommen, aber nicht überprüft werden konnte. Dabei führt sie auch neue Einflussgrößen ein und gibt deutliche Hinweise darauf, dass die Variation in größerem Ausmaß als bisher vermutet einzelfallspezifisch geregelt ist.

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Deutsch
Medientyp:	Aufsatz aus einem Sammelband
Format:	Online
DDC Klassifikation:	Germanische Sprachen; Deutsch (430)
Schlagworte:	Fugenelement; Korpus; Sprachvariante; Komposition <Wortbildung>; Kompositum; Nominalkompositum; Fallstudie
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Towards a treatment of register phenomena in HPSG

Autor*in: Machicao y Priemer, Antonio ; Müller, Stefan ; Schäfer, Roland ; Bildhauer, Felix

Erschienen: 2023

Verlag: Frankfurt am Main : Universitätsbibliothek Frankfurt am Main ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

In this paper, we deal with register-driven variation from a probabilistic perspective, as proposed in Schäfer, Bildhauer, Pankratz, Müller (2022). We compare two approaches to analyse this variation within HPSG. On the one hand, we consider a... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11447 https://ids-pub.bsz-bw.de/files/11447/Machicao_y_Priemer_Towards_a_treatment_2022.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-114471 https://doi.org/10.21248/hpsg.2022.5

In this paper, we deal with register-driven variation from a probabilistic perspective, as proposed in Schäfer, Bildhauer, Pankratz, Müller (2022). We compare two approaches to analyse this variation within HPSG. On the one hand, we consider a multiple-grammar approach and combine it with the architecture proposed in the CoreGram project Müller (2015) - discussing its advantages and disadvantages. On the other hand, we take into account a single-grammar approach and argue that it appears to be superior due to its computational efficiency and cognitive plausibility.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Head-driven phrase structure grammar; Phrasenstrukturgrammatik; Grammatik; Register
Lizenz:	creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess

Proceedings of the 12th Web as Corpus Workshop (ACL SIGWAC). Language Resources and Evaluation Conference (LREC 2020), Marseille, 11–16 May 2020

Autor*in: Barbaresi, Adrien ; Bildhauer, Felix ; Schäfer, Roland ; Stemle, Egon

Erschienen: 2023

Verlag: Paris : European Language Resources Association ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

The 12th Web as Corpus workshop (WAC-XII) looks at the past, present, and future of web corpora given the fact that large web corpora are nowadays provided mostly by a few major initiatives and companies, and the diversity of the early years appears... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11827 https://ids-pub.bsz-bw.de/files/11827/Barbaresi_Proceedings_of_the_12th_web_as_corpus_workshop_2020.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-118271

The 12th Web as Corpus workshop (WAC-XII) looks at the past, present, and future of web corpora given the fact that large web corpora are nowadays provided mostly by a few major initiatives and companies, and the diversity of the early years appears to have faded slightly. Also, we acknowledge the fact that alternative sources of data (such as data from Twitter and similar platforms) have emerged, some of them only available to large companies and their affiliates, such as linguistic data from social media and other forms of the deep web. At the same time, gathering interesting and relevant web data (web crawling) is becoming an ever more intricate task as the nature of the data offered on the web changes (for example the death of forums in favour of more closed platforms).

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Buch (Monographie)
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Korpus; Computerlinguistik; Forschungsdaten
Lizenz:	creativecommons.org/licenses/by-nc/4.0/ ; info:eu-repo/semantics/openAccess

Filtern nach

Aktive Filter

Kategorien:

Bereich

Quelle

Format

Beteiligt

Medientyp

Sprache

Jahr

Letzte Suchanfragen

Ergebnisse für *

Datensatz Genitiv- und von-Attribute

Multiple fronting vs. VP fronting in German

Datensatz Genitiv- und von-Attribute

Datensatz attributive dass-Sätze und zu-Infinitive

Web corpus construction

Web corpus construction

Multiple fronting vs. VP fronting in German

Mehrfache Vorfeldbesetzung und Informationsstruktur: Eine Bestandsaufnahme

Web Corpus Construction

Web corpus construction

Web Corpus Construction

Mehrfache Vorfeldbesetzung und Informationsstruktur

Starke und schwache Adjektivflexion in neuem korpuslinguistischen Licht

Alternation von zu- und dass-Komplementen

Web corpus construction

Web corpus construction

Web corpus construction

Einleitung

Einleitung

Data point selection for genre-aware parsing

Data point selection for genre-aware parsing

Starke und schwache Adjektivflexion in neuem korpuslinguistischen Licht

Fugenelemente im Korpus: Regelhaftigkeit und Variation

Towards a treatment of register phenomena in HPSG

Proceedings of the 12th Web as Corpus Workshop (ACL SIGWAC). Language Resources and Evaluation Conference (LREC 2020), Marseille, 11–16 May 2020