no code implementations • ACL 2020 • Stein{\th}{\'o}r Steingr{\'\i}msson, Hrafn Loftsson, Andy Way
When rich morphology exacerbates the data sparsity problem, it is imperative to have accurate alignment and filtering methods that can help make the most of what is available by maximising the number of correctly translated segments in a corpus and minimising noise by removing incorrect translations and segments containing extraneous data.
no code implementations • LREC 2020 • Stein{\th}{\'o}r Steingr{\'\i}msson, {\'A}g{\'u}sta {\TH}orbergsd{\'o}ttir, Hjalti Danielsson, Gunnar Thor Ornolfsson
Automatic term extraction (ATE) from texts is critical for effective terminology work in small speech communities.
no code implementations • LREC 2020 • Stein{\th}{\'o}r Steingr{\'\i}msson, Starka{\dh}ur Barkarson, Gunnar Thor {\"O}rn{\'o}lfsson
We describe the acquisition, annotation and encoding of the corpus of the Althingi parliamentary proceedings.
no code implementations • LREC 2020 • David Erik Mollberg, {\'O}lafur Helgi J{\'o}nsson, Sunneva {\TH}orsteinsd{\'o}ttir, Stein{\th}{\'o}r Steingr{\'\i}msson, Eyd{\'\i}s Huld Magn{\'u}sd{\'o}ttir, Jon Gudnason
Upon completion, Samr{\'o}mur will be the largest open speech corpus for Icelandic collected from the public domain.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • LREC 2020 • Elham Akhlaghi, Branislav B{\'e}di, Fatih Bekta{\c{s}}, Harald Berthelsen, Matthias Butterweck, Cathy Chua, Catia Cucchiarin, G{\"u}l{\c{s}}en Eryi{\u{g}}it, Johanna Gerlach, Hanieh Habibi, Neasa N{\'\i} Chiar{\'a}in, Manny Rayner, Stein{\th}{\'o}r Steingr{\'\i}msson, Helmer Strik
LARA (Learning and Reading Assistant) is an open source platform whose purpose is to support easy conversion of plain texts into multimodal online versions suitable for use by language learners.
no code implementations • LREC 2020 • Stein{\th}{\'o}r Steingr{\'\i}msson, Starka{\dh}ur Barkarson, Gunnar Thor {\"O}rn{\'o}lfsson
We introduce an array of open and accessible tools to facilitate the use of the Icelandic Gigaword Corpus, in the field of Natural Language Processing as well as for students, linguists, sociologists and others benefitting from using large corpora.