no code implementations • EAMT 2022 • Artūrs Vasiļevskis, Jānis Ziediņš, Marko Tadić, None Željka Motika, Mark Fishel, Hrafn Loftsson, Jón Gu, Claudia Borg, Keith Cortis, Judie Attard, Donatienne Spiteri
The work in progress on the CEF Action National Language Technology Platform (NLTP) is presented.
no code implementations • LREC 2022 • Jón Guðnason, Hrafn Loftsson
We pre-train four types of monolingual ELECTRA and ConvBERT models and compare our results to a previously trained monolingual RoBERTa model and the multilingual mBERT model.
no code implementations • gwll (LREC) 2022 • Steinþór Steingrímsson, Luke O’Brien, Finnur Ingimundarson, Hrafn Loftsson, Andy Way
By combining the most promising approaches and data sets, using confidence scores calculated from the data and the results of manually evaluating samples from our manual evaluation as indicators, we are able to induce lists of translations with a very high acceptance rate.
no code implementations • TDLE (LREC) 2022 • Marko Tadić, Daša Farkaš, Matea Filko, Artūrs Vasiļevskis, Andrejs Vasiļjevs, Jānis Ziediņš, Željka Motika, Mark Fishel, Hrafn Loftsson, Jón Guðnason, Claudia Borg, Keith Cortis, Judie Attard, Donatienne Spiteri
This article presents the work in progress on the collaborative project of several European countries to develop National Language Technology Platform (NLTP).
no code implementations • WS (NoDaLiDa) 2019 • Svanhvít Lilja Ingólfsdóttir, Sigurjón Þorsteinsson, Hrafn Loftsson
We report on work in progress which consists of annotating an Icelandic corpus for named entities (NEs) and using it for training a named entity recognizer based on a Bidirectional Long Short-Term Memory model.
no code implementations • RANLP (BUCC) 2021 • Steinþór Steingrímsson, Pintu Lohar, Hrafn Loftsson, Andy Way
Parallel sentences extracted from comparable corpora can be useful to supplement parallel corpora when training machine translation (MT) systems.
1 code implementation • NoDaLiDa 2021 • Steinþór Steingrímsson, Hrafn Loftsson, Andy Way
Being able to generate accurate word alignments is useful for a variety of tasks.
1 code implementation • 15 Nov 2023 • Steinþór Steingrímsson, Hrafn Loftsson, Andy Way
We present SentAlign, an accurate sentence alignment tool designed to handle very large parallel document pairs.
no code implementations • DCLRL (LREC) 2022 • Steinunn Rut Friðriksdóttir, Valdimar Ágúst Eggertsson, Benedikt Geir Jóhannesson, Hjalti Daníelsson, Hrafn Loftsson, Hafsteinn Einarsson
We describe our approach of using a multilingual entity linking model (mGENRE) in combination with Wikipedia API Search (WAPIS) to label our data and compare it to an approach using WAPIS only.
no code implementations • 20 May 2022 • Hlynur D. Hlynsson, Steindór Ellertsson, Jón F. Daðason, Emil L. Sigurdsson, Hrafn Loftsson
Clinical Text Notes (CTNs) contain physicians' reasoning process, written in an unstructured free text format, as they examine and interview patients.
no code implementations • NAACL 2021 • J{\'o}n Da{\dh}ason, Hrafn Loftsson, Salome Sigur{\dh}ard{\'o}ttir, {\TH}orsteinn Bj{\"o}rnsson
Automatic Text Summarization (ATS) is the task of generating concise and fluent summaries from one or more documents.
no code implementations • ACL 2020 • Stein{\th}{\'o}r Steingr{\'\i}msson, Hrafn Loftsson, Andy Way
When rich morphology exacerbates the data sparsity problem, it is imperative to have accurate alignment and filtering methods that can help make the most of what is available by maximising the number of correctly translated segments in a corpus and minimising noise by removing incorrect translations and segments containing extraneous data.
1 code implementation • LREC 2020 • Jón Friðrik Daðason, David Erik Mollberg, Hrafn Loftsson, Kristín Bjarnadóttir
In this paper, we present a character-based BiLSTM model for splitting Icelandic compound words, and show how varying amounts of training data affects the performance of the model.
1 code implementation • LREC 2020 • Anna Björk Nikulásdóttir, Jón Guðnason, Anton Karl Ingason, Hrafn Loftsson, Eiríkur Rögnvaldsson, Einar Freyr Sigurðsson, Steinþór Steingrímsson
In this paper, we describe a new national language technology programme for Icelandic.
no code implementations • RANLP 2019 • Vilhj{\'a}lmur {\TH}orsteinsson, Hulda {\'O}lad{\'o}ttir, Hrafn Loftsson
Our parsing system is able to parse about 90{\%} of all sentences in articles published on the main Icelandic news websites.
no code implementations • WS (NoDaLiDa) 2019 • Svanhvít Lilja Ingólfsdóttir, Hrafn Loftsson, Jón Friðrik Daðason, Kristín Bjarnadóttir
Lemmatization, finding the basic morphological form of a word in a corpus, is an important step in many natural language processing tasks when working with morphologically rich languages.
1 code implementation • RANLP 2019 • Steinþór Steingrímsson, Örvar Kárason, Hrafn Loftsson
Previous work on using BiLSTM models for PoS tagging has primarily focused on small tagsets.
no code implementations • LREC 2014 • Sigr{\'u}n Helgad{\'o}ttir, Hrafn Loftsson, Eir{\'\i}kur R{\"o}gnvaldsson
We describe manual correction and a method for semi-automatic error detection and correction.
no code implementations • LREC 2014 • Anton Karl Ingason, Hrafn Loftsson, Eir{\'\i}kur R{\"o}gnvaldsson, Einar Freyr Sigur{\dh}sson, Joel C. Wallenberg
This paper presents ongoing work that aims to improve machine parsing of Faroese using a combination of Faroese and Icelandic training data.