Suchergebnisse

A pragmatic approach to XML interoperability – the Component Metadata Infrastructure (CMDI)

Autor*in: Broeder, Daan ; Schonefeld, Oliver ; Trippel, Thorsten ; Van Uytvanck, Dieter ; Witt, Andreas

Erschienen: 2022

Verlag: Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

XML has been designed for creating structured documents, but the information that is encoded in these structures are, by definition, out of scope for XML. Additional sources, normally not easily interpretable by computers, such as documentation are... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/10875 https://ids-pub.bsz-bw.de/files/10875/Broeder_A_pragmatic_approach_2011.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-108756 https://doi.org/10.4242/BalisageVol7.Broeder01

XML has been designed for creating structured documents, but the information that is encoded in these structures are, by definition, out of scope for XML. Additional sources, normally not easily interpretable by computers, such as documentation are needed to determine the intention of specific tags in a tag-set. The Component Metadata Infrastructure (CMDI) takes a rather pragmatic approach to foster interoperability between XML instances in the domain of metadata descriptions for language resources. This paper gives an overview of this approach.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	XML; Metadaten; Repository; Datenmanagement; Computerlinguistik
Lizenz:	rightsstatements.org/page/InC/1.0/ ; info:eu-repo/semantics/openAccess

Word-level alignment of paper documents with their electronic full-text counterparts

Autor*in: Müller, Mark-Christoph ; Ghosh, Sucheta ; Wittig, Ulrike ; Rey, Maja

Erschienen: 2022

Verlag: Stroudsburg, Pennsylvania : Association for Computational Linguistics ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

We describe a simple procedure for the automatic creation of word-level alignments between printed documents and their respective full-text versions. The procedure is unsupervised, uses standard, off-the-shelf components only, and reaches an F-score... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11083 https://ids-pub.bsz-bw.de/files/11083/Mueller_Ghosh_Word_level_alignment_2021.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-110839 https://doi.org/10.18653/v1/2021.bionlp-1.19

We describe a simple procedure for the automatic creation of word-level alignments between printed documents and their respective full-text versions. The procedure is unsupervised, uses standard, off-the-shelf components only, and reaches an F-score of 85.01 in the basic setup and up to 86.63 when using pre- and post-processing. Potential areas of application are manual database curation (incl. document triage) and biomedical expression OCR.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Computerlinguistik; Volltext; Optische Zeichenerkennung; XML; Ausrichten
Lizenz:	creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess

pyMMAX2: Deep access to MMAX2 projects from Python

Autor*in: Müller, Mark-Christoph

Erschienen: 2022

Verlag: Stroudsburg, Pennsylvania : Association for Computational Linguistics ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

pyMMAX2 is an API for processing MMAX2 stand-off annotation data in Python. It provides a lightweight basis for the development of code which opens up the Java- and XML-based ecosystem of MMAX2 for more recent, Python-based NLP and data science... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11084 https://ids-pub.bsz-bw.de/files/11084/Mueller_pyMMAX2_2020.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-110848

pyMMAX2 is an API for processing MMAX2 stand-off annotation data in Python. It provides a lightweight basis for the development of code which opens up the Java- and XML-based ecosystem of MMAX2 for more recent, Python-based NLP and data science methods. While pyMMAX2 is pure Python, and most functionality is implemented from scratch, the API re-uses the complex implementation of the essential business logic for MMAX2 annotation schemes by interfacing with the original MMAX2 Java libraries. pyMMAX2 is available for download at github.com/nlpAThits/pyMMAX2.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Computerlinguistik; Python; API; XML; Neurolinguistisches Programmieren; Data Science
Lizenz:	creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess

A flexible stand-off data model with query language for multi-level annotation

Autor*in: Müller, Mark-Christoph

Erschienen: 2022

Verlag: Stroudsburg, Pennsylvania : Association for Computational Linguistics ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

We present an implemented XML data model and a new, simplified query language for multi-level annotated corpora. The new query language involves automatic conversion of queries into the underlying, more complicated MMAXQL query language. It supports... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11153 https://ids-pub.bsz-bw.de/files/11153/Mueller_A_flexible_stand_off_data_model_2005.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-111537 https://doi.org/10.3115/1225753.1225781

We present an implemented XML data model and a new, simplified query language for multi-level annotated corpora. The new query language involves automatic conversion of queries into the underlying, more complicated MMAXQL query language. It supports queries for sequential and hierarchical, but also associative (e.g. coreferential) relations. The simplified query language has been designed with non-expert users in mind.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Datenmodell; Abfragesprache; XML; Korpus; Computerlinguistik
Lizenz:	creativecommons.org/licenses/by-nc-sa/3.0/ ; info:eu-repo/semantics/openAccess

An API for discourse-level access to XML-encoded corpora

Autor*in: Müller, Mark-Christoph ; Strube, Michael

Erschienen: 2022

Verlag: Paris : European Language Resources Association (ELRA) ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

We describe a simple and efficient Java object model and application programming interface (API) for (possibly multi-modal) annotated natural language corpora. Corpora are represented as elements like Sentences, Turns, Utterances, Words, Gestures and... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11160 https://ids-pub.bsz-bw.de/files/11160/Mueller_Strube_An_API_for_discourse_level_access_2002.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-111602

We describe a simple and efficient Java object model and application programming interface (API) for (possibly multi-modal) annotated natural language corpora. Corpora are represented as elements like Sentences, Turns, Utterances, Words, Gestures and Markables. The API allows linguists to access corpora in terms of these discourse-level elements, i.e. at a conceptual level they are familiar with, with the flexibility offered by a general purpose programming language. It is also a contribution to corpus standardization efforts because it is based on a straightforward and easily extensible data model which can serve as a target for conversion of different corpus formats.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	API; XML; Korpus; Natürliche Sprache; Vereinheitlichung; Datenmodell; Softwarewiederverwendung
Lizenz:	creativecommons.org/licenses/by-nc-sa/3.0/ ; info:eu-repo/semantics/openAccess

Annotating anaphoric and bridging relations with MMAX

Autor*in: Müller, Mark-Christoph ; Strube, Michael

Erschienen: 2022

Verlag: Stroudsburg, Pennsylvania : Association for Computational Linguistics ; Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

We present a tool for the annotation of anaphoric and bridging relations in a corpus of written texts. Based on differences as well as similarities between these phenomena, we define an annotation scheme. We then implement the scheme within an... mehr

Volltext:	https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/11165 https://ids-pub.bsz-bw.de/files/11165/Mueller_Annotating_anaphoric_and_bridging_relations_2001.pdf
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:mh39-111657 https://doi.org/10.3115/1118078.1118090

We present a tool for the annotation of anaphoric and bridging relations in a corpus of written texts. Based on differences as well as similarities between these phenomena, we define an annotation scheme. We then implement the scheme within an annotation tool and demonstrate its use.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt Germanistik
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Sprache (400)
Schlagworte:	Annotation; Anapher <Syntax>; Korpus; Computerlinguistik; Schriftsprache; Datenmodell; XML
Lizenz:	creativecommons.org/licenses/by-nc-sa/3.0/ ; info:eu-repo/semantics/openAccess

Filtern nach

Aktive Filter

Kategorien:

Bereich

Quelle

Format

Beteiligt

Medientyp

Sprache

Jahr

Letzte Suchanfragen

Ergebnisse für *

A pragmatic approach to XML interoperability – the Component Metadata Infrastructure (CMDI)

Word-level alignment of paper documents with their electronic full-text counterparts

pyMMAX2: Deep access to MMAX2 projects from Python

A flexible stand-off data model with query language for multi-level annotation

An API for discourse-level access to XML-encoded corpora

Annotating anaphoric and bridging relations with MMAX

Kontakt

Partner