NorDial is the first step to creating a corpus of dialectal variation of written Norwegian. It consists of small corpus of tweets manually annotated as Bokmål, Nynorsk, any dialect, or a mix.
1 PAPER • NO BENCHMARKS YET
Automatic language identification is a challenging problem. Discriminating between closely related languages is especially difficult. This paper presents a machine learning approach for automatic language identification for the Nordic languages, which often suffer miscategorisation by existing state-of-the-art tools. Concretely we will focus on discrimination between six Nordic languages: Danish, Swedish, Norwegian (Nynorsk), Norwegian (Bokmål), Faroese and Icelandic.
1 PAPER • 1 BENCHMARK