Search Results for author: Dirk Goldhahn

Found 11 papers, 1 papers with code

Crawling Under-Resourced Languages - a Portal for Community-Contributed Corpus Collection

no code implementations DCLRL (LREC) 2022 Erik Körner, Felix Helfer, Christopher Schröder, Thomas Eckart, Dirk Goldhahn

The “Web as corpus” paradigm opens opportunities for enhancing the current state of language resources for endangered and under-resourced languages.

Typical Sentences as a Resource for Valence

no code implementations LREC 2020 Uwe Quasthoff, Lars Hellan, Erik K{\"o}rner, Thomas Eckart, Dirk Goldhahn, Dorothee Beermann

Verb valence information can be derived from corpora by using subcorpora of typical sentences that are constructed in a language independent manner based on frequent POS structures.

POS Sentence

Usability and Accessibility of Bantu Language Dictionaries in the Digital Age: Mobile Access in an Open Environment

no code implementations LREC 2020 Thomas Eckart, Sonja Bosch, Uwe Quasthoff, Erik K{\"o}rner, Dirk Goldhahn, Simon Kaleschke

A dictionary profile defines available presentation options of the dictionary data in the app and can be specified according to the needs of the respective user group.

LEMMA Word Embeddings

A 500 Million Word POS-Tagged Icelandic Corpus

no code implementations LREC 2014 Thomas Eckart, Erla Hallsteinsd{\'o}ttir, Sigr{\'u}n Helgad{\'o}ttir, Uwe Quasthoff, Dirk Goldhahn

The new POS-tagged Icelandic corpus of the Leipzig Corpora Collection is an extensive resource for the analysis of the Icelandic language.

Part-Of-Speech Tagging POS +1

High Quality Word Lists as a Resource for Multiple Purposes

no code implementations LREC 2014 Uwe Quasthoff, Dirk Goldhahn, Thomas Eckart, Erla Hallsteinsd{\'o}ttir, Sabine Fiedler

Since 2011 the comprehensive, electronically available sources of the Leipzig Corpora Collection have been used consistently for the compilation of high quality word lists.

Vocal Bursts Intensity Prediction

Building Large Monolingual Dictionaries at the Leipzig Corpora Collection: From 100 to 200 Languages

no code implementations LREC 2012 Dirk Goldhahn, Thomas Eckart, Uwe Quasthoff

In this paper we describe current advances of the project in collecting and processing text data automatically for a large number of languages.

Lemmatization

The Influence of Corpus Quality on Statistical Measurements on Language Resources

no code implementations LREC 2012 Thomas Eckart, Uwe Quasthoff, Dirk Goldhahn

The quality of statistical measurements on corpora is strongly related to a strict definition of the measuring process and to corpus quality.

Sentence

Cannot find the paper you are looking for? You can Submit a new open access paper.