Suchergebnisse

Web 1T 5-gram, 10 European Languages Version 1

Autor*in:

Erschienen: [2009]

Verlag: Linguistic Data Consortium, [Philadelphia, Pennsylvania]

Web 1T 5-gram, 10 European Languages Version 1 was created by Google, Inc. It consists of word n-grams and their observed frequency counts for ten European languages: Czech, Dutch, French, German, Italian, Polish, Portuguese, Romanian, Spanish and... mehr

Mannheim: Leibniz-Institut für Deutsche Sprache (IDS), Bibliothek

Standort:

Leibniz-Institut für Deutsche Sprache (IDS), Bibliothek

Signatur:

bestellt

Fernleihe:

keine Fernleihe

Link zum Verbundkatalog:

Südwestdeutscher Bibliotheksverbund (SWB)

Web 1T 5-gram, 10 European Languages Version 1 was created by Google, Inc. It consists of word n-grams and their observed frequency counts for ten European languages: Czech, Dutch, French, German, Italian, Polish, Portuguese, Romanian, Spanish and Swedish. The length of the n-grams ranges from unigrams (single words) to five-grams. The n-gram counts were generated from approximately one hundred billion word tokens of text for each language, or approximately one trillion total tokens. The n-grams were extracted from publicly-accessible web pages from October 2008 to December 2008. This data set contains only n-grams that appeared at least 40 times in the processed sentences. Less frequent n-grams were discarded. While the aim was to identify and collect pages from the specific target languages only, it is likely that some text from other languages may be in the final data. This dataset will be useful for statistical language modeling, including machine translation, speech recognition and other uses. *Data* The input encoding of documents was automatically detected, and all text was converted to UTF8. The following table contains statistics for the entire release. File sizes (entire corpus): approximately 27.9 GB compressed (bzip2) text files Total number of tokens: 1,306,807,412,486 Total number of sentences: 150,727,365,731 Total number of unigrams: 95,998,281 Total number of bigrams: 646,439,858 Total number of trigrams: 1,312,972,925 Total number of fourgrams: 1,396,154,236 Total number of fivegrams: 1,149,361,413 Total number of n-grams: 4,600,926,713

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Hinweise zum Inhalt

Verlag (Request Form)

Dataset documentation

Quelle:	Leibniz-Institut für Deutsche Sprache, Bibliothek
Beteiligt:	Brants, Thorsten; Franz, Alex
Sprache:	Schwedisch; Spanisch; Rumänisch; Portugiesisch; Polnisch; Niederländisch; Italienisch; Französisch; Deutsch; Tschechisch
Medientyp:	Buch (Monographie); Datenträger
ISBN:	1585635251; 9781585635252
Schlagworte:	Swedish language; Spanish language; Romanian language; Portuguese language; Polish language; Dutch language; Italian language; French language; German language; Czech language; Linguistics; Computational linguistics; Machine translating; Automatic speech recognition; Automatic speech recognition; Computational linguistics; Czech language ; Data processing; Dutch language ; Data processing; French language ; Data processing; German language ; Data processing; Linguistics ; Statistical methods; Machine translating; Polish language ; Data processing; Portuguese language ; Data processing; Spanish language ; Data processing; Swedish language ; Data processing; Databases; Excerpts; Excerpts
Umfang:	1 Online-Ressource (1 online resource)
Bemerkung(en):	LDC number: LDC2009T25 Data samples are available on the LDC website

Der Begriff der Unzählbarkeit bei deutschen Nomina mit dem Blick auf das Tschechische

Autor*in: Vomáčková, Olga

Erschienen: 2009

Verlag: Univ. Palackého, Pedagogická Fak., Olomouc

Berlin: Staatsbibliothek zu Berlin - Preußischer Kulturbesitz, Haus Unter den Linden

Standort:

Staatsbibliothek zu Berlin - Preußischer Kulturbesitz, Haus Unter den Linden

Fernleihe:

uneingeschränkte Fernleihe, Kopie und Ausleihe

Link zum Verbundkatalog:

Gemeinsamer Bibliotheksverbund (GBV)

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Quelle:	Verbundkataloge
Sprache:	Deutsch
Medientyp:	Buch (Monographie)
Format:	Druck
ISBN:	9788024422589
Auflage/Ausgabe:	1. vyd.
Schriftenreihe:	Ediční řada - Monografie
Schlagworte:	German language; German language; German language; German language; Czech language
Umfang:	117 S

Der Begriff der Unzählbarkeit bei deutschen Nomina mit dem Blick auf das Tschechische

Autor*in: Vomáčková, Olga

Erschienen: 2009

Verlag: Univ. Palackého, Pedagogická Fak., Olomouc

Berlin: Staatsbibliothek zu Berlin - Preußischer Kulturbesitz, Haus Potsdamer Straße

Standort:

Staatsbibliothek zu Berlin - Preußischer Kulturbesitz, Haus Potsdamer Straße

Signatur:

3 A 169151

Fernleihe:

uneingeschränkte Fernleihe, Kopie und Ausleihe

Link zum Verbundkatalog:

Gemeinsamer Bibliotheksverbund (GBV)

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Quelle:	Verbundkataloge
Sprache:	Deutsch
Medientyp:	Buch (Monographie)
Format:	Druck
ISBN:	9788024422589
Auflage/Ausgabe:	1. vyd.
Schriftenreihe:	Ediční řada - Monografie
Schlagworte:	German language; German language; German language; German language; Czech language
Umfang:	117 S

Filtern nach

Aktive Filter

Kategorien:

Bereich

Quelle

Format

Beteiligt

Medientyp

Sprache

Jahr

Letzte Suchanfragen

Ergebnisse für *

Web 1T 5-gram, 10 European Languages Version 1

Mannheim: Leibniz-Institut für Deutsche Sprache (IDS), Bibliothek

Der Begriff der Unzählbarkeit bei deutschen Nomina mit dem Blick auf das Tschechische

Berlin: Staatsbibliothek zu Berlin - Preußischer Kulturbesitz, Haus Unter den Linden

Der Begriff der Unzählbarkeit bei deutschen Nomina mit dem Blick auf das Tschechische

Berlin: Staatsbibliothek zu Berlin - Preußischer Kulturbesitz, Haus Potsdamer Straße

Kontakt

Partner