IPAC (Icelandic Parallel Abstracts Corpus)

Introduced by Símonarson et al. in Icelandic Parallel Abstracts Corpus

IPAC (Icelandic Parallel Abstracts Corpus ) is a new Icelandic-English parallel corpus, composed of abstracts from student theses and dissertations. The texts were collected from the Skemman repository which keeps records of all theses, dissertations and final projects from students at Icelandic universities. The corpus was aligned based on sentence-level BLEU scores, in both translation directions, from NMT models using Bleualign. The result is a corpus of 64k sentence pairs from over 6 thousand parallel abstracts.

Source: Icelandic Parallel Abstracts Corpus

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages