Suchergebnisse

Computer-assisted language learning with grammars. A case study on Latin learning

Autor*in: Lange, Herbert

Erschienen: 2022

Verlag: Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

Learning new languages has a high relevance in today’s society with a globalized economy and the freedom to move abroad for work, study or other reasons. In this context new methods to teach and learn languages with the help of modern technology are... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11222 https://ids-pub.bsz-bw.de/files/11222/Lange_Computer_assisted_language_learning_2018.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-112226

Learning new languages has a high relevance in today’s society with a globalized economy and the freedom to move abroad for work, study or other reasons. In this context new methods to teach and learn languages with the help of modern technology are becoming more relevant besides traditional language classes. This work presents a new approach to combine a traditional language class with a modern computer-based approach for teaching. As a concrete example a web application to help teach and learn Latin was developed.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Unbestimmt
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Computerunterstütztes Verfahren; Fremdsprachenlernen; Computerlinguistik; Natürliche Sprache; Latein; Fremdsprachenunterricht
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

CorpusExplorer ; Eine Software zur korpuspragmatischen Analyse

Autor*in: Rüdiger, Jan Oliver

Erschienen: 2022

Zitierfähiger Link:

https://doi.org/10.17170/kobra-202202085725

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Deutsch
Medientyp:	Dissertation
Format:	Online
DDC Klassifikation:	Germanische Sprachen; Deutsch (430)
Schlagworte:	Korpuslinguistik; Korpuspragmatik; CorpusExplorer; Text-Mining; NLP; NLProc; Data-Mining; Natural Language Processing; Linguistik; Computerlinguistik; Corpus Linguisitics; swd:Software; swd:Open Source; swd:Computerlinguistik; swd:Korpus Linguistik; swd:Methode
Lizenz:	creativecommons.org/licenses/by-sa/4.0/ ; open access

Proceedings of the workshop on language technology resources and tools for digital humanities (LT4DH), December 11-16, 2016, Osaka, Japan

Autor*in: Hinrichs, Erhard ; Hinrichs, Marie ; Trippel, Thorsten

Erschienen: 2022

Verlag: Stroudsburg : Association for Computational Linguistics ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

Bibliographische Angaben
Zugang

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/10854 https://ids-pub.bsz-bw.de/files/10854/Hinrichs_Proceedings_of_the_workshop_on_language_technology_resources_2016.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-108541

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Computerlinguistik; Digital Humanities
Lizenz:	creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess

Just for the record, CMDI should be about semantic interoperability

Autor*in: Trippel, Thorsten ; Zinn, Claus

Erschienen: 2022

Verlag: Utrecht : CLARIN ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

The Component MetaData Infrastructure (CMDI) provides a lego-brick framework for the creation, use and re-use of self-defined metadata formats. The design of CMDI can be a force forgood, but history shows that it has often been misunderstood or badly... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/10855 https://ids-pub.bsz-bw.de/files/10855/Trippel_Zinn_Just_for_the_record_2016.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-108552

The Component MetaData Infrastructure (CMDI) provides a lego-brick framework for the creation, use and re-use of self-defined metadata formats. The design of CMDI can be a force forgood, but history shows that it has often been misunderstood or badly executed. Consequently,it has led the community towards the dark ages of metadata clutter rather than the bright side of semantic interoperability. In this abstract, we report on the condition of CMDI but also outlinean agenda to make the CMDI world a better place to use, share and profit from metadata.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Metadaten; Computerlinguistik; Datenmanagement
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Towards automatic quality assessment of component metadata

Autor*in: Trippel, Thorsten ; Broeder, Daan ; Durco, Matej ; Ohren, Oddrun

Erschienen: 2022

Verlag: Paris : European Language Resources Association (ELRA) ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

Measuring the quality of metadata is only possible by assessing the quality of the underlying schema and the metadata instance. We propose some factors that are measurable automatically for metadata according to the CMD framework, taking into account... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/10861 https://ids-pub.bsz-bw.de/files/10861/Trippel_Towards_automatic_quality_assessment_2014.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-108619

Measuring the quality of metadata is only possible by assessing the quality of the underlying schema and the metadata instance. We propose some factors that are measurable automatically for metadata according to the CMD framework, taking into account the variability of schemas that can be defined in this framework. The factors include among others the number of elements, the (re-)use of reusable components, the number of filled in elements. The resulting score can serve as an indicator of the overall quality of the CMD instance, used for feedback to metadata providers or to provide an overview of the overall quality of metadata within a repository. The score is independent of specific schemas and generalizable. An overall assessment of harvested metadata is provided in form of statistical summaries and the distribution, based on a corpus of harvested metadata. The score is implemented in XQuery and can be used in tools, editors and repositories.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Metadaten; Datenqualität; Dokumentenserver; Datenmanagement; Computerlinguistik
Lizenz:	creativecommons.org/licenses/by-nc/4.0/ ; info:eu-repo/semantics/openAccess

A metadata editor to support the description of linguistic resources

Autor*in: Dima, Emanuel ; Hinrichs, Erhard ; Hoppermann, Christina ; Trippel, Thorsten ; Zinn, Claus

Erschienen: 2022

Verlag: Paris : European Language Resources Association ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

Creating and maintaining metadata for various kinds of resources requires appropriate tools to assist the user. The paper presents the metadata editor ProFormA for the creation and editing of CMDI (Component Metadata Infrastructure) metadata in web... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/10863 https://ids-pub.bsz-bw.de/files/10863/Dima_Hinrichs_A_metadata_editor_2012.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-108631

Creating and maintaining metadata for various kinds of resources requires appropriate tools to assist the user. The paper presents the metadata editor ProFormA for the creation and editing of CMDI (Component Metadata Infrastructure) metadata in web forms. This editor supports a number of CMDI profiles currently being provided for different types of resources. Since the editor is based on XForms and server-side processing, users can create and modify CMDI files in their standard browser without the need for further processing. Large parts of ProFormA are implemented as web services in order to reuse them in other contexts and programs.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Metadaten; Editor; Server; Web Services; Computerlinguistik
Lizenz:	creativecommons.org/licenses/by-nc-sa/3.0/ ; info:eu-repo/semantics/openAccess

Standardizing a component metadata infrastructure

Autor*in: Broeder, Daan ; van Uytvanck, Dieter ; Gavrilidou, Maria ; Trippel, Thorsten ; Windhouwer, Menzo

Erschienen: 2022

Verlag: Paris : European Language Resources Association ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

This paper describes the status of the standardization efforts of a Component Metadata approach for describing Language Resources with metadata. Different linguistic and Language & Technology communities as CLARIN, META-SHARE and NaLiDa use this... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/10865 https://ids-pub.bsz-bw.de/files/10865/Broeder_Standardizing_a_component_metadata_infrastructure_2012.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-108659

This paper describes the status of the standardization efforts of a Component Metadata approach for describing Language Resources with metadata. Different linguistic and Language & Technology communities as CLARIN, META-SHARE and NaLiDa use this component approach and see its standardization of as a matter for cooperation that has the possibility to create a large interoperable domain of joint metadata. Starting with an overview of the component metadata approach together with the related semantic interoperability tools and services as the ISOcat data category registry and the relation registry we explain the standardization plan and efforts for component metadata within ISO TC37/SC4. Finally, we present information about uptake and plans of the use of component metadata within the three mentioned linguistic and L&T communities.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Standardisierung; Metadaten; Infrastruktur; Datenmanagement; Computerlinguistik
Lizenz:	creativecommons.org/licenses/by-nc-sa/3.0/ ; info:eu-repo/semantics/openAccess

Proceedings of the workshop describing language resources with metadata: towards flexibility and interoperability in the documentation of language resources. LREC 2012, May 22, 2012, Istanbul, Turkey.

Autor*in: Arranz, Victoria ; Broeder, Daan ; Gaiffe, Bertrand ; Gavrilidou, Maria ; Monachini, Monica ; Trippel, Thorsten

Erschienen: 2022

Verlag: Paris : European Language Resources Association ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

The current state of the art for metadata provision allows for a very flexible approach, catering for the needs of different archives and communities, referring to common data category registries that describe the meaning of a data category at least... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/10866 https://ids-pub.bsz-bw.de/files/10866/Arranz_Broeder_Describing_LRs_with_metadata_2012.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-108662

The current state of the art for metadata provision allows for a very flexible approach, catering for the needs of different archives and communities, referring to common data category registries that describe the meaning of a data category at least to authors of metadata. Component models for metadata provisions are for example used by CLARIN and META-SHARE, but there is also an increased flexibility in other metadata schemas such as Dublin Core, which is usually not seen as appropriate for meaningful description of language resources. Making resources available for others and putting this to a second use in other projects has never been more widely accepted as a sensible efficient way to avoid a waste of efforts and resources. However, when it comes to the details, there is still a vast number of problems. This workshop has aimed at being a forum to address issues and challenges in the concrete work with metadata for LRs, not restricted to a single initiative for archiving LRs. It has allowed for exchange and discussion and we hope that the reader finds the articles here compiled interesting and useful.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Metadaten; Normung; Forschung; Computerlinguistik; Datenmanagement
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

CMDI: a component metadata infrastructure

Autor*in: Broeder, Daan ; Windhouwer, Menzo ; van Uytvanck, Dieter ; Goosen, Twan ; Trippel, Thorsten

Erschienen: 2022

Verlag: Paris : European Language Resources Association ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

The paper’s purpose is to give an overview of the work on the Component Metadata Infrastructure (CMDI) that was implemented in the CLARIN research infrastructure. It explains, the underlying schema, the accompanying tools and services. It also... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/10867 https://ids-pub.bsz-bw.de/files/10867/Broeder_CMDI_a_component_metadata_infrastructure_2012.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-108677

The paper’s purpose is to give an overview of the work on the Component Metadata Infrastructure (CMDI) that was implemented in the CLARIN research infrastructure. It explains, the underlying schema, the accompanying tools and services. It also describes the status and impact of the CMDI developments done within the CLARIN project and past and future collaborations with other projects.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Metadaten; Forschung; Infrastruktur; Computerlinguistik; Datenmanagement
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Komponenten-basierte Metadatenschemata und Facetten-basierte Suche. Ein flexibler und universeller Ansatz

Autor*in: Barkey, Reinhild ; Hinrichs, Erhard ; Hoppermann, Christina ; Trippel, Thorsten ; Zinn, Claus

Erschienen: 2022

Verlag: Boizenburg : Werner Hülsbusch ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

Wenn man verschiedenartige Forschungsdaten über Metadaten inhaltlich beschreiben möchte, sind bibliografische Angaben allein nicht ausreichend. Vielmehr benötigt man zusätzliche Beschreibungsmittel, die der Natur und Komplexität gegebener... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/10874 https://ids-pub.bsz-bw.de/files/10874/Barkey_Komponenten_basierte_Metadatenschemata_2011.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-108748 https://doi.org/10.5281/zenodo.4134504

Wenn man verschiedenartige Forschungsdaten über Metadaten inhaltlich beschreiben möchte, sind bibliografische Angaben allein nicht ausreichend. Vielmehr benötigt man zusätzliche Beschreibungsmittel, die der Natur und Komplexität gegebener Forschungsressourcen Rechnung tragen. Verschiedene Arten von Forschungsdaten bedürfen verschiedener Metadatenprofile, die über gemeinsame Komponenten definiert werden. Solche Forschungsdaten können gesammelt (z.B. über OAI-PMH-Harvesting) und mittels Facetten-basierter Suche über eine einheitliche Schnittstelle exploriert werden. Der beschriebene Anwendungskontext kann über sprachwissenschaftliche Daten hinaus verallgemeinert werden. ; The content description of various kinds of research data using metadata requires other than bibliographical data fields that are alone not sufficient for this purpose. To properly account for research data, other metadata fields are required, often specific to a given research data set. Consequently, metadata profiles adapted to different types of resources need to be created. These are defined by building blocks, called components, that can be shared across profiles. Research data described in this way can be harvested, for example, using OAI-PMH. The resulting metadata collection can then be explored via a unified interface using faceted browsers. The described application is in the area of linguistic data, but our approach is also applicable for other domains.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Deutsch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Metadaten; Forschungsdaten; Forschung; Bibliografische Daten; Datenmanagement; Computerlinguistik
Lizenz:	creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess

A pragmatic approach to XML interoperability – the Component Metadata Infrastructure (CMDI)

Autor*in: Broeder, Daan ; Schonefeld, Oliver ; Trippel, Thorsten ; Van Uytvanck, Dieter ; Witt, Andreas

Erschienen: 2022

Verlag: Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

XML has been designed for creating structured documents, but the information that is encoded in these structures are, by definition, out of scope for XML. Additional sources, normally not easily interpretable by computers, such as documentation are... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/10875 https://ids-pub.bsz-bw.de/files/10875/Broeder_A_pragmatic_approach_2011.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-108756 https://doi.org/10.4242/BalisageVol7.Broeder01

XML has been designed for creating structured documents, but the information that is encoded in these structures are, by definition, out of scope for XML. Additional sources, normally not easily interpretable by computers, such as documentation are needed to determine the intention of specific tags in a tag-set. The Component Metadata Infrastructure (CMDI) takes a rather pragmatic approach to foster interoperability between XML instances in the domain of metadata descriptions for language resources. This paper gives an overview of this approach.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	XML; Metadaten; Repository; Datenmanagement; Computerlinguistik
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Trailblazing through forests of resources in linguistics

Autor*in: Barkey, Reinhild ; Hinrichs, Erhard ; Hoppermann, Christina ; Trippel, Thorsten ; Zinn, Claus

Erschienen: 2022

Verlag: Stanford : Stanford University Library ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

Linguistics is facing the challenge of many other sciences as it continues to grow into increasingly complex subfields, each with its own separate or overarching branches. While linguists are certainly aware of the overall structure of the research... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/10904 https://ids-pub.bsz-bw.de/files/10904/Barkey_Hinrichs_Trailblazing_through_forests_2011.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-109046

Linguistics is facing the challenge of many other sciences as it continues to grow into increasingly complex subfields, each with its own separate or overarching branches. While linguists are certainly aware of the overall structure of the research field, they cannot follow all developments other than those of their subfields. It is thus important to help specialists but also newcomers alike to bushwhack through evolved or unknown territory of linguistic data. A considerable amount of research data in linguistics is described with metadata. While studies described and published in archived journals and conference proceedings receive a quite homogeneous set of metadata tags — e.g., author, title, publisher —, this does not hold for the empirical data and analyses that underlie such studies. Moreover, lexicons, grammars, experimental data, and other types of resources come in different forms; and to make things worse, their description in terms of metadata is also not uniform, if existing at all. These problems are well-known and there are now a number of international initiatives — e.g., CLARIN, FlareNet, MetaNet, DARIAH — to build infrastructures for managing linguistic resources. The NaLiDa project, funded by the German Research Foundation, aims at facilitating the management and access to linguistic resources originating from German research institutions. In cooperation with the German SFB 833 research center, we are developing a combination of faceted and full-text search to give integrated access through heterogeneous metadata sets. Our approach is supported by a central registry for metadata field descriptors, and a component repository for structured groups of data categories as larger building blocks.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Digital Humanities; Forschungsdaten; Metadaten; Datenmanagement; Computerlinguistik
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Word sense alignment and disambiguation for historical encyclopedias

Autor*in: Hagen, Thora ; Jannidis, Fotis ; Witt, Andreas

Erschienen: 2022

Verlag: Gießen : Graphen & Netzwerke; AG des Verbandes Digital Humanities im deutschsprachigen Raum e.V. ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

This paper will address the challenge of creating a knowledge graph from a corpus of historical encyclopedias with a special focus on word sense alignment (WSA) and disambiguation (WSD). More precisely, we examine WSA and WSD approaches based on... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/10983 https://ids-pub.bsz-bw.de/files/10983/Hagen_Jannidis_Witt_Word_sense_alignment_2021.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-109834

This paper will address the challenge of creating a knowledge graph from a corpus of historical encyclopedias with a special focus on word sense alignment (WSA) and disambiguation (WSD). More precisely, we examine WSA and WSD approaches based on article similarity to link messy historical data, utilizing Wikipedia as aground-truth component – as the lack of a critical overlap in content paired with the amount of variation between and within the encyclopedias does not allow for choosing a ”baseline” encyclopedia to align the others to. Additionally, we are comparing the disambiguation performance of conservative methods like the Lesk algorithm to more recent approaches, i.e. using language models to disambiguate senses.

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Aufsatz aus einem Sammelband
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Semasiologie; Enzyklopädie; Wissensgraph; Korpus; Wikipedia; Computerlinguistik
Lizenz:	creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess

„Korpora in der germanistischen Sprachwissenschaft – mündlich, schriftlich, multimedial“; Bericht von der 58. Jahrestagung des Leibniz-Instituts für Deutsche Sprache (als Online-Konferenz), 15. - 17. März 2022

Autor*in: Frick, Elena ; Helmer, Henrike

Erschienen: 2022

Verlag: Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11028 https://ids-pub.bsz-bw.de/files/11028/Frick_Helmer_Korpora_in_der_germanistischen_Sprachwissenschaft_2022.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-110285 https://doi.org/10.14618/sr-2-2022-frick

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Deutsch
Medientyp:	Aufsatz aus einer Zeitschrift
Format:	Online
DDC Klassifikation:	Germanische Sprachen; Deutsch (430)
Schlagworte:	Deutsch; Korpus; Textanalyse; Diskursanalyse; Computerlinguistik
Lizenz:	creativecommons.org/licenses/by-sa/3.0/de/deed.de ; info:eu-repo/semantics/openAccess

Interoperable language resources

Autor*in: Declerck, Thierry ; Ide, Nancy ; Trippel, Thorsten

Erschienen: 2022

Verlag: Bonn : Institut für Kommunikationswissenschaften der Universität Bonn ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

In this contribution we present some work of the R&D European project “LIRICS” and of the ISO/TC 37/SC 4 committee related to the topic of interoperability and re-use of language resources. We introduce some basic mechanisms of the standardization... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11075 https://ids-pub.bsz-bw.de/files/11075/Declerck_Interoperable_language_resources_2007.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-110757

In this contribution we present some work of the R&D European project “LIRICS” and of the ISO/TC 37/SC 4 committee related to the topic of interoperability and re-use of language resources. We introduce some basic mechanisms of the standardization work in ISO and describe in more details the general approach on how to cope with the annotation of language data within ISO. ; Unser Beitrag beschreibt aktuelle Arbeiten des europäischen Projekts „LIRICS“ und des ISO-Ausschusses ISO/TC 37 / SC 4 zum Thema Interoperabilität und Wiederverwendbarkeit von Sprachresourcen. Neben einer allgemeinen Einführung zu den Mechanismen der Standardisierung bei ISO präsentieren wir einige ISO-Vorschläge für die Standardisierung der Annotierung von linguistischen Daten.

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Aufsatz aus einer Zeitschrift
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Interoperabilität; Standardisierung; Annotation; Sprachdaten; ISO-Norm; Computerlinguistik
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Word-level alignment of paper documents with their electronic full-text counterparts

Autor*in: Müller, Mark-Christoph ; Ghosh, Sucheta ; Wittig, Ulrike ; Rey, Maja

Erschienen: 2022

Verlag: Stroudsburg, Pennsylvania : Association for Computational Linguistics ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

We describe a simple procedure for the automatic creation of word-level alignments between printed documents and their respective full-text versions. The procedure is unsupervised, uses standard, off-the-shelf components only, and reaches an F-score... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11083 https://ids-pub.bsz-bw.de/files/11083/Mueller_Ghosh_Word_level_alignment_2021.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-110839 https://doi.org/10.18653/v1/2021.bionlp-1.19

We describe a simple procedure for the automatic creation of word-level alignments between printed documents and their respective full-text versions. The procedure is unsupervised, uses standard, off-the-shelf components only, and reaches an F-score of 85.01 in the basic setup and up to 86.63 when using pre- and post-processing. Potential areas of application are manual database curation (incl. document triage) and biomedical expression OCR.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Computerlinguistik; Volltext; Optische Zeichenerkennung; XML; Ausrichten
Lizenz:	creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess

pyMMAX2: Deep access to MMAX2 projects from Python

Autor*in: Müller, Mark-Christoph

Erschienen: 2022

Verlag: Stroudsburg, Pennsylvania : Association for Computational Linguistics ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

pyMMAX2 is an API for processing MMAX2 stand-off annotation data in Python. It provides a lightweight basis for the development of code which opens up the Java- and XML-based ecosystem of MMAX2 for more recent, Python-based NLP and data science... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11084 https://ids-pub.bsz-bw.de/files/11084/Mueller_pyMMAX2_2020.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-110848

pyMMAX2 is an API for processing MMAX2 stand-off annotation data in Python. It provides a lightweight basis for the development of code which opens up the Java- and XML-based ecosystem of MMAX2 for more recent, Python-based NLP and data science methods. While pyMMAX2 is pure Python, and most functionality is implemented from scratch, the API re-uses the complex implementation of the essential business logic for MMAX2 annotation schemes by interfacing with the original MMAX2 Java libraries. pyMMAX2 is available for download at github.com/nlpAThits/pyMMAX2.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Computerlinguistik; Python; API; XML; Neurolinguistisches Programmieren; Data Science
Lizenz:	creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess

Reconstructing manual information extraction with DB-to-document backprojection: Experiments in the life science domain

Autor*in: Müller, Mark-Christoph ; Ghosh, Sucheta ; Rey, Maja ; Wittig, Ulrike ; Müller, Wolfgang ; Strube, Michael

Erschienen: 2022

Verlag: Stroudsburg, Pennsylvania : Association for Computational Linguistics ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

We introduce a novel scientific document processing task for making previously inaccessible information in printed paper documents available to automatic processing. We describe our data set of scanned documents and data records from the biological... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11085 https://ids-pub.bsz-bw.de/files/11085/Mueller_Reconstructing_manual_information_extraction_2020.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-110854 https://doi.org/10.18653/v1/2020.sdp-1.9

We introduce a novel scientific document processing task for making previously inaccessible information in printed paper documents available to automatic processing. We describe our data set of scanned documents and data records from the biological database SABIO-RK, provide a definition of the task, and report findings from preliminary experiments. Rigorous evaluation proved challenging due to lack of gold-standard data and a difficult notion of correctness. Qualitative inspection of results, however, showed the feasibility and usefulness of the task.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Computerlinguistik; Information Extraction; Schriftstück; Experiment; Datenanalyse; Qualitative Inhaltsanalyse
Lizenz:	creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess

Transparent, efficient, and robust word embedding access with WOMBAT

Autor*in: Müller, Mark-Christoph ; Strube, Michael

Erschienen: 2022

Verlag: Stroudsburg, Pennsylvania : Association for Computational Linguistics ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

We present WOMBAT, a Python tool which supports NLP practitioners in accessing word embeddings from code. WOMBAT addresses common research problems, including unified access, scaling, and robust and reproducible preprocessing. Code that uses WOMBAT... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11086 https://ids-pub.bsz-bw.de/files/11086/Mueller_Strube_Transparent_efficient_and_robust_2018.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-110862

We present WOMBAT, a Python tool which supports NLP practitioners in accessing word embeddings from code. WOMBAT addresses common research problems, including unified access, scaling, and robust and reproducible preprocessing. Code that uses WOMBAT for accessing word embeddings is not only cleaner, more readable, and easier to reuse, but also much more efficient than code using standard in-memory methods: a Python script using WOMBAT for evaluating seven large word embedding collections (8.7M embedding vectors in total) on a simple SemEval sentence similarity task involving 250 raw sentence pairs completes in under ten seconds end-to-end on a standard notebook computer.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Python; Automatische Sprachanalyse; Code; Computerlinguistik
Lizenz:	creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess

Automatic detection of nonreferential it in spoken multi-party dialog

Autor*in: Müller, Mark-Christoph

Erschienen: 2022

Verlag: Stroudsburg, Pennsylvania : Association for Computational Linguistics ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

We present an implemented machine learning system for the automatic detection of nonreferential it in spoken dialog. The system builds on shallow features extracted from dialog transcripts. Our experiments indicate a level of performance that makes... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11104 https://ids-pub.bsz-bw.de/files/11104/Mueller_Automatic_detection_of_nonreferential_it_2006.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-111048

We present an implemented machine learning system for the automatic detection of nonreferential it in spoken dialog. The system builds on shallow features extracted from dialog transcripts. Our experiments indicate a level of performance that makes the system usable as a preprocessing filter for a coreference resolution system. We also report results of an annotation study dealing with the classification of it by naive subjects.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	it; Dialog; Gesprochene Sprache; Maschinelles Lernen; Mitschrift; Korpus; Automatische Klassifikation; Computerlinguistik
Lizenz:	creativecommons.org/licenses/by-nc-sa/3.0/ ; info:eu-repo/semantics/openAccess

Semantic author name disambiguation with word embeddings

Autor*in: Müller, Mark-Christoph

Erschienen: 2022

Verlag: Cham : Springer ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS) [Zweitveröffentlichung]

We present a supervised machine learning AND system which tackles semantic similarity between publication titles by means of word embeddings. Word embeddings are integrated as external components, which keeps the model small and efficient, while... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11135 https://ids-pub.bsz-bw.de/files/11135/Mueller_Semantic_author_name_disambiguation_2017.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-111355 https://doi.org/10.1007/978-3-319-67008-9_24

We present a supervised machine learning AND system which tackles semantic similarity between publication titles by means of word embeddings. Word embeddings are integrated as external components, which keeps the model small and efficient, while allowing for easy extensibility and domain adaptation. Initial experiments show that word embeddings can improve the Recall and F score of the binary classification sub-task of AND. Results for the clustering sub-task are less clear, but also promising and overall show the feasibility of the approach.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Maschinelles Lernen; Veröffentlichung; Deep learning; Semantik; Computerlinguistik
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

On the contribution of word-level semantics to practical author name disambiguation

Autor*in: Müller, Mark-Christoph

Erschienen: 2022

Verlag: New York : Association for Computing Machinery ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS) [Zweitveröffentlichung]

We demonstrate the utility of word embedding-based semantic similarity methods for Author Name Disambiguation. mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11137 https://ids-pub.bsz-bw.de/files/11137/Mueller_On_the_contribution_2018.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-111370 https://doi.org/10.1145/3197026.3203912

We demonstrate the utility of word embedding-based semantic similarity methods for Author Name Disambiguation.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Semantik; Autor; Elektronische Bibliothek; Maschinelles Lernen; Computerlinguistik
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

A flexible stand-off data model with query language for multi-level annotation

Autor*in: Müller, Mark-Christoph

Erschienen: 2022

Verlag: Stroudsburg, Pennsylvania : Association for Computational Linguistics ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

We present an implemented XML data model and a new, simplified query language for multi-level annotated corpora. The new query language involves automatic conversion of queries into the underlying, more complicated MMAXQL query language. It supports... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11153 https://ids-pub.bsz-bw.de/files/11153/Mueller_A_flexible_stand_off_data_model_2005.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-111537 https://doi.org/10.3115/1225753.1225781

We present an implemented XML data model and a new, simplified query language for multi-level annotated corpora. The new query language involves automatic conversion of queries into the underlying, more complicated MMAXQL query language. It supports queries for sequential and hierarchical, but also associative (e.g. coreferential) relations. The simplified query language has been designed with non-expert users in mind.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Datenmodell; Abfragesprache; XML; Korpus; Computerlinguistik
Lizenz:	creativecommons.org/licenses/by-nc-sa/3.0/ ; info:eu-repo/semantics/openAccess

Multi-level annotation in MMAX

Autor*in: Müller, Mark-Christoph ; Strube, Michael

Erschienen: 2022

Verlag: Stroudsburg, Pennsylvania : Association for Computational Linguistics ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

We present a light-weight tool for the annotation of linguistic data on multiple levels. It is based on the simplification of annotations to sets of markables having attributes and standing in certain relations to each other. We describe the main... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11159 https://ids-pub.bsz-bw.de/files/11159/Mueller_Strube_Multi_level_annotation_2003.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-111596

We present a light-weight tool for the annotation of linguistic data on multiple levels. It is based on the simplification of annotations to sets of markables having attributes and standing in certain relations to each other. We describe the main features of the tool, emphasizing its simplicity, customizability and versatility

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Computerlinguistik; Daten; Korpus; Sprachdaten; Annotation
Lizenz:	creativecommons.org/licenses/by-nc-sa/3.0/ ; info:eu-repo/semantics/openAccess

Applying co-training to reference resolution

Autor*in: Müller, Mark-Christoph ; Rapp, Stefan ; Strube, Michael

Erschienen: 2022

Verlag: Stroudsburg, Pennsylvania : Association for Computational Linguistics ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

In this paper, we investigate the practical applicability of Co-Training for the task of building a classifier for reference resolution. We are concerned with the question if Co-Training can significantly reduce the amount of manual labeling work and... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11164 https://ids-pub.bsz-bw.de/files/11164/Mueller_Rapp_Strube_Applying_co_training_2002.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-111649 https://doi.org/10.3115/1073083.1073142

In this paper, we investigate the practical applicability of Co-Training for the task of building a classifier for reference resolution. We are concerned with the question if Co-Training can significantly reduce the amount of manual labeling work and still produce a classifier with an acceptable performance.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Computerlinguistik; Korpus
Lizenz:	creativecommons.org/licenses/by-nc-sa/3.0/ ; info:eu-repo/semantics/openAccess

Filtern nach

Aktive Filter

Kategorien:

Bereich

Quelle

Format

Beteiligt

Medientyp

Sprache

Jahr

Letzte Suchanfragen

Ergebnisse für *

Computer-assisted language learning with grammars. A case study on Latin learning

CorpusExplorer ; Eine Software zur korpuspragmatischen Analyse

Proceedings of the workshop on language technology resources and tools for digital humanities (LT4DH), December 11-16, 2016, Osaka, Japan

Just for the record, CMDI should be about semantic interoperability

Towards automatic quality assessment of component metadata

A metadata editor to support the description of linguistic resources

Standardizing a component metadata infrastructure

Proceedings of the workshop describing language resources with metadata: towards flexibility and interoperability in the documentation of language resources. LREC 2012, May 22, 2012, Istanbul, Turkey.

CMDI: a component metadata infrastructure

Komponenten-basierte Metadatenschemata und Facetten-basierte Suche. Ein flexibler und universeller Ansatz

A pragmatic approach to XML interoperability – the Component Metadata Infrastructure (CMDI)

Trailblazing through forests of resources in linguistics

Word sense alignment and disambiguation for historical encyclopedias

„Korpora in der germanistischen Sprachwissenschaft – mündlich, schriftlich, multimedial“; Bericht von der 58. Jahrestagung des Leibniz-Instituts für Deutsche Sprache (als Online-Konferenz), 15. - 17. März 2022

Interoperable language resources

Word-level alignment of paper documents with their electronic full-text counterparts

pyMMAX2: Deep access to MMAX2 projects from Python

Reconstructing manual information extraction with DB-to-document backprojection: Experiments in the life science domain

Transparent, efficient, and robust word embedding access with WOMBAT

Automatic detection of nonreferential it in spoken multi-party dialog

Semantic author name disambiguation with word embeddings

On the contribution of word-level semantics to practical author name disambiguation

A flexible stand-off data model with query language for multi-level annotation

Multi-level annotation in MMAX

Applying co-training to reference resolution

Kontakt

Partner