no code implementations • DCLRL (LREC) 2022 • Erik Körner, Felix Helfer, Christopher Schröder, Thomas Eckart, Dirk Goldhahn
The “Web as corpus” paradigm opens opportunities for enhancing the current state of language resources for endangered and under-resourced languages.
no code implementations • LREC 2020 • Uwe Quasthoff, Lars Hellan, Erik K{\"o}rner, Thomas Eckart, Dirk Goldhahn, Dorothee Beermann
Verb valence information can be derived from corpora by using subcorpora of typical sentences that are constructed in a language independent manner based on frequent POS structures.
no code implementations • LREC 2020 • Thomas Eckart, Sonja Bosch, Uwe Quasthoff, Erik K{\"o}rner, Dirk Goldhahn, Simon Kaleschke
A dictionary profile defines available presentation options of the dictionary data in the app and can be specified according to the needs of the respective user group.
no code implementations • WS 2019 • Imad Zeroual, Dirk Goldhahn, Thomas Eckart, Abdelhak Lakhouaja
The corpus data was collected from international Arabic news websites, all being freely available on the Web.
no code implementations • LREC 2014 • Thomas Eckart, Erla Hallsteinsd{\'o}ttir, Sigr{\'u}n Helgad{\'o}ttir, Uwe Quasthoff, Dirk Goldhahn
The new POS-tagged Icelandic corpus of the Leipzig Corpora Collection is an extensive resource for the analysis of the Icelandic language.
no code implementations • LREC 2014 • Uwe Quasthoff, Dirk Goldhahn, Thomas Eckart, Erla Hallsteinsd{\'o}ttir, Sabine Fiedler
Since 2011 the comprehensive, electronically available sources of the Leipzig Corpora Collection have been used consistently for the compilation of high quality word lists.
no code implementations • LREC 2014 • Dirk Goldhahn, Uwe Quasthoff
This paper will focus on the evaluation of automatic methods for quantifying language similarity.
no code implementations • LREC 2012 • Dirk Goldhahn, Thomas Eckart, Uwe Quasthoff
In this paper we describe current advances of the project in collecting and processing text data automatically for a large number of languages.
no code implementations • LREC 2012 • Thomas Eckart, Uwe Quasthoff, Dirk Goldhahn
The quality of statistical measurements on corpora is strongly related to a strict definition of the measuring process and to corpus quality.