Suchergebnisse

A pragmatic approach to XML interoperability – the Component Metadata Infrastructure (CMDI)

Autor*in: Broeder, Daan ; Schonefeld, Oliver ; Trippel, Thorsten ; Van Uytvanck, Dieter ; Witt, Andreas

Erschienen: 2022

Verlag: Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

XML has been designed for creating structured documents, but the information that is encoded in these structures are, by definition, out of scope for XML. Additional sources, normally not easily interpretable by computers, such as documentation are... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/10875 https://ids-pub.bsz-bw.de/files/10875/Broeder_A_pragmatic_approach_2011.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-108756 https://doi.org/10.4242/BalisageVol7.Broeder01

XML has been designed for creating structured documents, but the information that is encoded in these structures are, by definition, out of scope for XML. Additional sources, normally not easily interpretable by computers, such as documentation are needed to determine the intention of specific tags in a tag-set. The Component Metadata Infrastructure (CMDI) takes a rather pragmatic approach to foster interoperability between XML instances in the domain of metadata descriptions for language resources. This paper gives an overview of this approach.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	XML; Metadaten; Repository; Datenmanagement; Computerlinguistik
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Word sense alignment and disambiguation for historical encyclopedias

Autor*in: Hagen, Thora ; Jannidis, Fotis ; Witt, Andreas

Erschienen: 2022

Verlag: Gießen : Graphen & Netzwerke; AG des Verbandes Digital Humanities im deutschsprachigen Raum e.V. ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

This paper will address the challenge of creating a knowledge graph from a corpus of historical encyclopedias with a special focus on word sense alignment (WSA) and disambiguation (WSD). More precisely, we examine WSA and WSD approaches based on... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/10983 https://ids-pub.bsz-bw.de/files/10983/Hagen_Jannidis_Witt_Word_sense_alignment_2021.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-109834

This paper will address the challenge of creating a knowledge graph from a corpus of historical encyclopedias with a special focus on word sense alignment (WSA) and disambiguation (WSD). More precisely, we examine WSA and WSD approaches based on article similarity to link messy historical data, utilizing Wikipedia as aground-truth component – as the lack of a critical overlap in content paired with the amount of variation between and within the encyclopedias does not allow for choosing a ”baseline” encyclopedia to align the others to. Additionally, we are comparing the disambiguation performance of conservative methods like the Lesk algorithm to more recent approaches, i.e. using language models to disambiguate senses.

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Aufsatz aus einem Sammelband
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Semasiologie; Enzyklopädie; Wissensgraph; Korpus; Wikipedia; Computerlinguistik
Lizenz:	creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess

Was darf die sprachwissenschaftliche Forschung? Juristische Fragen bei der Arbeit mit Sprachdaten

Autor*in: Kamocki, Pawel ; Witt, Andreas

Erschienen: 2022

Verlag: Paderborn : Wilhelm Fink ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS) [Zweitveröffentlichung]

Sich in der Linguistik mit rechtlichen Themen beschäftigen zu müssen, ist auf den ersten Blick überraschend. Da jedoch in den Sprachwissenschaften empirisch gearbeitet wird und Sprachdaten, insbesondere Texte und Ton- und Videoaufnahmen sowie... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11067 https://ids-pub.bsz-bw.de/files/11067/Kamocki_Witt_Was_darf_die_sprachwissenschaftliche_Forschung_2022.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-110671

Sich in der Linguistik mit rechtlichen Themen beschäftigen zu müssen, ist auf den ersten Blick überraschend. Da jedoch in den Sprachwissenschaften empirisch gearbeitet wird und Sprachdaten, insbesondere Texte und Ton- und Videoaufnahmen sowie Transkripte gesprochener Sprache, in den letzten Jahren auch verstärkt Sprachdaten internetbasierter Kommunikation, als Basis für die linguistische Forschung dienen, müssen rechtliche Rahmenbedingungen für jede Art von Datennutzung beachtet werden. Natürlich arbeiten auch andere Wissenschaften, wie z. B. die Astronomie oder die Meteorologie, empirisch. Jedoch gibt es einen grundsätzlichen Unterschied der empirischen Basis: Im Gegensatz zu Temperaturen, die gemessen, oder Konstellationen von Himmelskörpern, die beobachtet werden, basieren Sprachdaten auf schriftlichen, mündlichen oder gebärdeten Äußerungen von Menschen, wodurch sich juristisch begründete Beschränkungen ihrer Nutzung ergeben.

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Deutsch
Medientyp:	Aufsatz aus einem Sammelband
Format:	Online
DDC Klassifikation:	Recht (340); Sprache (400)
Schlagworte:	Datenschutz; Digital Humanities; Forschungsdaten; Recht; Personenbezogene Daten; Forschung; Sprachdaten; Geistiges Eigentum; Urheberrecht; Wissenschaft; Creative Commons; Open Access
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Ethical issues in language resources and language technology – Tentative taxonomy

Autor*in: Kamocki, Paweł ; Witt, Andreas

Erschienen: 2022

Verlag: Paris : European Language Resources Association (ELRA) ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

Ethical issues in Language Resources and Language Technology are often invoked, but rarely discussed. This is at least partly because little work has been done to systematize ethical issues and principles applicable in the fields of Language... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11114 https://ids-pub.bsz-bw.de/files/11114/Kamocki_Witt_Ethical_issues_2022.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-111146

Ethical issues in Language Resources and Language Technology are often invoked, but rarely discussed. This is at least partly because little work has been done to systematize ethical issues and principles applicable in the fields of Language Resources and Language Technology. This paper provides an overview of ethical issues that arise at different stages of Language Resources and Language Technology development, from the conception phase through the construction phase to the use phase. Based on this overview, the authors propose a tentative taxonomy of ethical issues in Language Resources and Language Technology, built around five principles: Privacy, Property, Equality, Transparency and Freedom. The authors hope that this tentative taxonomy will facilitate ethical assessment of projects in the field of Language Resources and Language Technology, and structure the discussion on ethical issues in this domain, which may eventually lead to the adoption of a universally accepted Code of Ethics of the Language Resources and Language Technology community.

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Aufsatz aus einem Sammelband
Format:	Online
DDC Klassifikation:	Sprache (400); Ethik (170)
Schlagworte:	Ethik; Privatsphäre; Eigentum; Gleichheit; Transparenz; Freiheit; Daten; Sprachdaten
Lizenz:	creativecommons.org/licenses/by-nc/4.0/ ; info:eu-repo/semantics/openAccess

Language matters. The European research infrastructure CLARIN, today and tomorrow

Autor*in: de Jong, Franciska ; Van Uytvanck, Dieter ; Frontini, Francesca ; van den Bosch, Antal ; Fišer, Darja ; Witt, Andreas

Erschienen: 2022

Verlag: Berlin/Boston : de Gruyter ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

CLARIN stands for “Common Language Resources and Technology Infrastructure”. In 2012 CLARIN ERIC was established as a legal entity with the mission to create and maintain a digital infrastructure to support the sharing, use, and sustainability of... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11285 https://ids-pub.bsz-bw.de/files/11285/de_Jong_Van_Uytvanck_Language_matters_2022.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-112858 https://doi.org/10.1515/9783110767377-002

CLARIN stands for “Common Language Resources and Technology Infrastructure”. In 2012 CLARIN ERIC was established as a legal entity with the mission to create and maintain a digital infrastructure to support the sharing, use, and sustainability of language data (in written, spoken, or multimodal form) available through repositories from all over Europe, in support of research in the humanities and social sciences and beyond. Since 2016 CLARIN has had the status of Landmark research infrastructure and currently it provides easy and sustainable access to digital language data and also offers advanced tools to discover, explore, exploit, annotate, analyse, or combine such datasets, wherever they are located. This is enabled through a networked federation of centres: language data repositories, service centres, and knowledge centres with single sign-on access for all members of the academic community in all participating countries. In addition, CLARIN offers open access facilities for other interested communities of use, both inside and outside of academia. Tools and data from different centres are interoperable, so that data collections can be combined and tools from different sources can be chained to perform operations at different levels of complexity. The strategic agenda adopted by CLARIN and the activities undertaken are rooted in a strong commitment to the Open Science paradigm and the FAIR data principles. This also enables CLARIN to express its added value for the European Research Area and to act as a key driver of innovation and contributor to the increasing number of industry programmes running on data-driven processes and the digitalization of society at large.

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Aufsatz aus einem Sammelband
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Infrastruktur; Sprachdaten; Datensatz; Open Access; Open Science; FAIR data principles; Innovation; Digitalisierung; SSH
Lizenz:	creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess

Preface

Autor*in: Fišer, Darja ; Witt, Andreas

Erschienen: 2022

Verlag: Berlin/Boston : de Gruyter ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11291 https://ids-pub.bsz-bw.de/files/11291/Fiser_Witt_Preface_2022.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-112912 https://doi.org/10.1515/9783110767377-202

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Aufsatz aus einem Sammelband
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Technische Infrastruktur; Sprachdaten; Geisteswissenschaften; Sozialwissenschaften; Digital Humanities
Lizenz:	creativecommons.org/licenses/by-sa/4.0/ ; info:eu-repo/semantics/openAccess

The future is now. The digital transformation in the German linguistics community and the key role of the IDS

Autor*in: Witt, Andreas ; Kamocki, Paweł

Erschienen: 2022

Verlag: Budapest : Nyelvtudományi Kutatóközpont / Hungarian Research Centre for Linguistics ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS) [Zweitveröffentlichung]

The Leibniz-Institute for the German Language (IDS) was established in Mannheim in 1964. Since then, it has been at the forefront of innovation in German linguistics as a hub for digital language data. This chapter presents various lessons learnt... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11300 https://ids-pub.bsz-bw.de/files/11300/Witt_Kamocki_The_future_is_now_2022.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-113008

The Leibniz-Institute for the German Language (IDS) was established in Mannheim in 1964. Since then, it has been at the forefront of innovation in German linguistics as a hub for digital language data. This chapter presents various lessons learnt from over five decades of work by the IDS, ranging from the importance of sustainability, through its strong technical base and FAIR principles, to the IDS’ role in national and international cooperation projects and its expertise on legal and ethical issues related to language resources and language technology.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Digitalisierung; Deutsch; Leibniz-Institut für Deutsche Sprache (IDS); Digitale Daten; Sprachdaten; Nachhaltigkeit; FAIR data principles; Ethik; Kooperation
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Filtern nach

Aktive Filter

Kategorien:

Bereich

Quelle

Format

Beteiligt

Medientyp

Sprache

Jahr

Letzte Suchanfragen

Ergebnisse für *

A pragmatic approach to XML interoperability – the Component Metadata Infrastructure (CMDI)

Word sense alignment and disambiguation for historical encyclopedias

Was darf die sprachwissenschaftliche Forschung? Juristische Fragen bei der Arbeit mit Sprachdaten

Ethical issues in language resources and language technology – Tentative taxonomy

Language matters. The European research infrastructure CLARIN, today and tomorrow

Preface

The future is now. The digital transformation in the German linguistics community and the key role of the IDS

Kontakt

Partner