no code implementations • COLING (LAW) 2020 • Sebastian Nordhoff
This paper reports on the harvesting, analysis, and enrichment of 20k documents from 4 different endangered language archives in 300 different low-resource languages.
no code implementations • LDL (ACL) 2022 • Sebastian Nordhoff, Thomas Krämer
Many NLP resources and programs focus on a handful of major languages.
no code implementations • LREC 2020 • Sebastian Nordhoff
This paper describes a collection of 20k ELAN annotation files harvested from five different endangered language archives.
no code implementations • LREC 2020 • Kilu von Prince, Sebastian Nordhoff
For most of the world{'}s languages, no primary data are available, even as many languages are disappearing.
no code implementations • LREC 2016 • Sebastian Nordhoff, Siri Tuttle, Olga Lovick
This paper describes a repository of example sentences in three endangered Athabascan languages: Koyukon, Upper Tanana, Lower Tanana.
no code implementations • LREC 2016 • Mathias Schenner, Sebastian Nordhoff
We present texigt, a command-line tool for the extraction of structured linguistic data from LaTeX source documents, and a language resource that has been generated using this tool: a corpus of interlinear glossed text (IGT) extracted from open access books published by Language Science Press.
no code implementations • LREC 2012 • Sebastian Nordhoff, Harald Hammarstr{\"o}m
Language resources can be divided into structural resources treating phonology, morphosyntax, semantics etc.
no code implementations • LREC 2012 • Christian Chiarcos, Sebastian Hellmann, Sebastian Nordhoff, Steven Moran, Richard Littauer, Judith Eckle-Kohler, Iryna Gurevych, Silvana Hartmann, Michael Matuschek, Christian M. Meyer
This paper describes the Open Linguistics Working Group (OWLG) of the Open Knowledge Foundation (OKFN).