Tilde MODEL Corpus (Tilde Multilingual Open Data for European Languages)

Introduced by Rozis et al. in Tilde MODEL - Multilingual Open Data for EU Languages

Tilde MODEL Corpus is a multilingual corpora for European languages – particularly focused on the smaller languages. The collected resources have been cleaned, aligned, and formatted into a corpora standard TMX format useable for developing new Language technology products and services.

It contains over 10M segments of multilingual open data.

The data has been collected from sites allowing free use and reuse of its content, as well as from Public Sector web sites.

Source: Tilde MODEL - Multilingual Open Data for EU Languages

Homepage