Suchergebnisse

Techniken und Praktiken der Verdatung

Autor*in: Bender, Michael ; Bubenhofer, Noah ; Dreesen, Philipp ; Georgi, Christopher ; Rüdiger, Jan Oliver ; Vogel, Friedemann

Erschienen: 2022

Verlag: Berlin/Boston : de Gruyter ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

Fragen der Verdatung sind Bestandteil der digitalen Diskursanalyse und keine Vorarbeiten. Die Analyse digital(isiert)er Diskurse setzt im Unterschied zur Auswertung nicht-digital repräsentierter Sprache und Kommunikation notwendig technische... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11076 https://ids-pub.bsz-bw.de/files/11076/Bender_Techniken_und_Praktiken_der_Verdatung_2022.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-110766 https://doi.org/10.1515/9783110721447-007

Fragen der Verdatung sind Bestandteil der digitalen Diskursanalyse und keine Vorarbeiten. Die Analyse digital(isiert)er Diskurse setzt im Unterschied zur Auswertung nicht-digital repräsentierter Sprache und Kommunikation notwendig technische Verfahren und Praktiken, Algorithmen und Software voraus, die den Untersuchungsgegenstand als digitales Datum konstituieren. Die nachfolgenden Abschnitte beschreiben kurz und knapp wiederkehrende Aspekte dieser Verdatungstechniken und -praktiken, insbesondere mit Blick auf Erhebung und Transformation (Abschnitt 2), Korpuskompilierung (Abschnitt 3), Annotation (Abschnitt 4) und Wege der analytischen Datenerschließung (Abschnitt 5). Im Fazit wird die Relevanz der Verdatungsarbeit für den Analyseprozess zusammengefasst (6).

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Deutsch
Medientyp:	Aufsatz aus einem Sammelband
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Diskursanalyse; Diskurs; Kommunikation; Algorithmus; Software; Datenerhebung; Datentransformation; Korpus; Annotation
Lizenz:	creativecommons.org/licenses/by-nd/4.0/deed.de ; info:eu-repo/semantics/openAccess

LRTwiki: enriching the likelihood ratio test with encyclopedic information for the extraction of relevant terms

Autor*in: Jakob, Niklas ; Müller, Mark-Christoph ; Gurevych, Iryna

Erschienen: 2022

Verlag: Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

This paper introduces LRTwiki, an improved variant of the Likelihood Ratio Test (LRT). The central idea of LRTwiki is to employ a comprehensive domain specific knowledge source as additional “on-topic” data sets, and to modify the calculation of the... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11090 https://ids-pub.bsz-bw.de/files/11090/Jakob_Mueller_Gurevych_LRTwiki_2009.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-110906

This paper introduces LRTwiki, an improved variant of the Likelihood Ratio Test (LRT). The central idea of LRTwiki is to employ a comprehensive domain specific knowledge source as additional “on-topic” data sets, and to modify the calculation of the LRT algorithm to take advantage of this new information. The knowledge source is created on the basis of Wikipedia articles. We evaluate on the two related tasks product feature extraction and keyphrase extraction, and find LRTwiki to yield a significant improvement over the original LRT in both tasks.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Bibliotheks- und Informationswissenschaften (020); Sprache (400)
Schlagworte:	Likelihood-Quotienten-Test; Enzyklopädie; Information Extraction; Datensatz; Algorithmus; Wikipedia; Fehleranalyse
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Flexible UIMA components for information retrieval research

Autor*in: Müller, Christof ; Zesch, Torsten ; Müller, Mark-Christoph ; Bernhard, Delphine ; Ignatova, Kateryna ; Gurevych, Iryna ; Mühlhäuser, Max

Erschienen: 2022

Verlag: Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

In this paper, we present a suite of flexible UIMA-based components for information retrieval research which have been successfully used (and re-used) in several projects in different application domains. Implementing the whole system as UIMA... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11096 https://ids-pub.bsz-bw.de/files/11096/Mueller_Zesch_Flexible_UIMA_components_2008.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-110969

In this paper, we present a suite of flexible UIMA-based components for information retrieval research which have been successfully used (and re-used) in several projects in different application domains. Implementing the whole system as UIMA components is beneficial for configuration management, component reuse, implementation costs, analysis and visualization.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400); Bibliotheks- und Informationswissenschaften (020)
Schlagworte:	Information Retrieval; Konfigurationsmanagement; Information Extraction; Datensatz; Forschung; Algorithmus; Automatische Sprachanalyse; Informationsmanagement
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Matrix and double-array representations for efficient finite state tokenization

Autor*in: Diewald, Nils

Erschienen: 2022

Verlag: Paris : European Language Resources Association (ELRA) ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

This paper presents an algorithm and an implementation for efficient tokenization of texts of space-delimited languages based on a deterministic finite state automaton. Two representations of the underlying data structure are presented and a model... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11109 https://ids-pub.bsz-bw.de/files/11109/Diewald_Matrix_and_double_array_representations_2022.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-111091

This paper presents an algorithm and an implementation for efficient tokenization of texts of space-delimited languages based on a deterministic finite state automaton. Two representations of the underlying data structure are presented and a model implementation for German is compared with state-of-the-art approaches. The presented solution is faster than other tools while maintaining comparable quality.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Algorithmus; Endlicher Zustandsraum; Datenstruktur; Deutsch; Korpus
Lizenz:	creativecommons.org/licenses/by-nc/4.0/ ; info:eu-repo/semantics/openAccess

Beyond the stars: exploiting free-text user reviews to improve the accuracy of movie recommendations

Autor*in: Jakob, Niklas ; Weber, Stefan Hagen ; Müller, Mark-Christoph ; Gurevych, Iryna

Erschienen: 2022

Verlag: New York : Association for Computing Machinery ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS) [Zweitveröffentlichung]

In this paper we show that the extraction of opinions from free-text reviews can improve the accuracy of movie recommendations. We present three approaches to extract movie aspects as opinion targets and use them as features for the collaborative... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11139 https://ids-pub.bsz-bw.de/files/11139/Jakob_Weber_Mueller_Beyond_the_stars_2009.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-111390 https://doi.org/10.1145/1651461.1651473

In this paper we show that the extraction of opinions from free-text reviews can improve the accuracy of movie recommendations. We present three approaches to extract movie aspects as opinion targets and use them as features for the collaborative filtering. Each of these approaches requires different amounts of manual interaction. We collected a data set of reviews with corresponding ordinal (star) ratings of several thousand movies to evaluate the different features for the collaborative filtering. We employ a state-of-the-art collaborative filtering engine for the recommendations during our evaluation and compare the performance with and without using the features representing user preferences mined from the free-text reviews provided by the users. The opinion mining based features perform significantly better than the baseline, which is based on star ratings and genre information only.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Rezension; Film; Empfehlung; Kollaborative Filterung; Datensatz; Benutzer; Automatische Sprachanalyse; Textanalyse; Datenbank; Data Mining; Algorithmus; Empfehlungssystem
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Filtern nach

Aktive Filter

Kategorien:

Bereich

Quelle

Format

Beteiligt

Medientyp

Sprache

Jahr

Letzte Suchanfragen

Ergebnisse für *

Techniken und Praktiken der Verdatung

LRTwiki: enriching the likelihood ratio test with encyclopedic information for the extraction of relevant terms

Flexible UIMA components for information retrieval research

Matrix and double-array representations for efficient finite state tokenization

Beyond the stars: exploiting free-text user reviews to improve the accuracy of movie recommendations

Kontakt

Partner