Search Results for author: Stein{\th}{\'o}r Steingr{\'\i}msson

Found 9 papers, 0 papers with code

Effectively Aligning and Filtering Parallel Corpora under Sparse Data Conditions

no code implementations • ACL 2020 • Stein{\th}{\'o}r Steingr{\'\i}msson, Hrafn Loftsson, Andy Way

When rich morphology exacerbates the data sparsity problem, it is imperative to have accurate alignment and filtering methods that can help make the most of what is available by maximising the number of correctly translated segments in a corpus and minimising noise by removing incorrect translations and segments containing extraneous data.

Machine Translation Translation

Paper
Add Code

TermPortal: A Workbench for Automatic Term Extraction from Icelandic Texts

no code implementations • LREC 2020 • Stein{\th}{\'o}r Steingr{\'\i}msson, {\'A}g{\'u}sta {\TH}orbergsd{\'o}ttir, Hjalti Danielsson, Gunnar Thor Ornolfsson

Automatic term extraction (ATE) from texts is critical for effective terminology work in small speech communities.

Term Extraction

Paper
Add Code

IGC-Parl: Icelandic Corpus of Parliamentary Proceedings

no code implementations • LREC 2020 • Stein{\th}{\'o}r Steingr{\'\i}msson, Starka{\dh}ur Barkarson, Gunnar Thor {\"O}rn{\'o}lfsson

We describe the acquisition, annotation and encoding of the corpus of the Althingi parliamentary proceedings.

Paper
Add Code

Samr\'omur: Crowd-sourcing Data Collection for Icelandic Speech Recognition

no code implementations • LREC 2020 • David Erik Mollberg, {\'O}lafur Helgi J{\'o}nsson, Sunneva {\TH}orsteinsd{\'o}ttir, Stein{\th}{\'o}r Steingr{\'\i}msson, Eyd{\'\i}s Huld Magn{\'u}sd{\'o}ttir, Jon Gudnason

Upon completion, Samr{\'o}mur will be the largest open speech corpus for Icelandic collected from the public domain.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Constructing Multimodal Language Learner Texts Using LARA: Experiences with Nine Languages

no code implementations • LREC 2020 • Elham Akhlaghi, Branislav B{\'e}di, Fatih Bekta{\c{s}}, Harald Berthelsen, Matthias Butterweck, Cathy Chua, Catia Cucchiarin, G{\"u}l{\c{s}}en Eryi{\u{g}}it, Johanna Gerlach, Hanieh Habibi, Neasa N{\'\i} Chiar{\'a}in, Manny Rayner, Stein{\th}{\'o}r Steingr{\'\i}msson, Helmer Strik

LARA (Learning and Reading Assistant) is an open source platform whose purpose is to support easy conversion of plain texts into multimodal online versions suitable for use by language learners.

Paper
Add Code

Facilitating Corpus Usage: Making Icelandic Corpora More Accessible for Researchers and Language Users

no code implementations • LREC 2020 • Stein{\th}{\'o}r Steingr{\'\i}msson, Starka{\dh}ur Barkarson, Gunnar Thor {\"O}rn{\'o}lfsson

We introduce an array of open and accessible tools to facilitate the use of the Icelandic Gigaword Corpus, in the field of Natural Language Processing as well as for students, linguists, sociologists and others benefitting from using large corpora.

Word Embeddings