Suchergebnisse

Filtern nach

Letzte Suchanfragen

Ergebnisse für *

Es wurden 1 Ergebnisse gefunden.

Zeige Ergebnisse 1 bis 1 von 1.

Sortieren

Organizing corpora at the Stanford Literary Lab. Balancing simplicity and flexibility in metadata management

Autor*in: McClure, David ; Algee-Hewitt, Mark ; Douris, Steele ; Fredner, Erik ; Walser, Hannah

Erschienen: 2017

Verlag: Mannheim : Institut für Deutsche Sprache

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/6261 https://ids-pub.bsz-bw.de/files/6261/McClure_etal_Organizing_corpora_2017.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-62617

This article describes a series of ongoing efforts at the Stanford Literary Lab to manage a large collection of literary corpora (~40 billion words). This work is marked by a tension between two competing requirements – the corpora need to be merged together into higher-order collections that can be analyzed as units; but, at the same time, it’s also necessary to preserve granular access to the original metadata and relational organization of each individual corpus. We describe a set of data management practices that try to accommodate both of these requirements – Apache Spark is used to index data as Parquet tables on an HPC cluster at Stanford. Crucially, the approach distinguishes between what we call “canonical” and “combined” corpora, a variation on the well-established notion of a “virtual corpus” (Kupietz et al., 2014; Jakubíek et al., 2014; van Uytvanck, 2010).

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Korpus; Englisch; Texttechnologie; Datenmanagement; Metadaten
Lizenz:	creativecommons.org/licenses/by-nc-nd/3.0/de/deed.de ; info:eu-repo/semantics/openAccess

Filtern nach

Aktive Filter

Kategorien:

Bereich

Quelle

Format

Beteiligt

Medientyp

Sprache

Jahr

Letzte Suchanfragen

Ergebnisse für *

Organizing corpora at the Stanford Literary Lab. Balancing simplicity and flexibility in metadata management

Kontakt

Partner