Search Results for author: Steinþór Steingrímsson

Found 12 papers, 4 papers with code

Evolving Large Text Corpora: Four Versions of the Icelandic Gigaword Corpus

no code implementations LREC 2022 Starkaður Barkarson, Steinþór Steingrímsson, Hildur Hafsteinsdóttir

We show how the corpus has grown almost 50% in size from the first version to the fourth and how it was restructured in order to better accommodate different meta-data for different subcorpora.

Word Embeddings

Compiling and Filtering ParIce: An English-Icelandic Parallel Corpus

no code implementations WS (NoDaLiDa) 2019 Starkaður Barkarson, Steinþór Steingrímsson

We estimate that approximately 5% of the corpus data is noise or faulty alignments while more than 50% of the segments we deleted were faulty.

DIM: The Database of Icelandic Morphology

no code implementations WS (NoDaLiDa) 2019 Kristín Bjarnadóttir, Kristín Ingibjörg Hlynsdóttir, Steinþór Steingrímsson

The topic of this paper is The Database of Icelandic Morphology (DIM), a multipurpose linguistic resource, created for use in language technology, as a reference for the general public in Iceland, and for use in research on the Icelandic language.

Compiling a Highly Accurate Bilingual Lexicon by Combining Different Approaches

no code implementations gwll (LREC) 2022 Steinþór Steingrímsson, Luke O’Brien, Finnur Ingimundarson, Hrafn Loftsson, Andy Way

By combining the most promising approaches and data sets, using confidence scores calculated from the data and the results of manually evaluating samples from our manual evaluation as indicators, we are able to induce lists of translations with a very high acceptance rate.

Cross-Lingual Word Embeddings Machine Translation +1

SentAlign: Accurate and Scalable Sentence Alignment

1 code implementation15 Nov 2023 Steinþór Steingrímsson, Hrafn Loftsson, Andy Way

We present SentAlign, an accurate sentence alignment tool designed to handle very large parallel document pairs.

Machine Translation Sentence +1

Cannot find the paper you are looking for? You can Submit a new open access paper.