Discriminating Between Similar Nordic Languages

EACL (VarDial) 2021  ·  René Haas, Leon Derczynski ·

Automatic language identification is a challenging problem. Discriminating between closely related languages is especially difficult. This paper presents a machine learning approach for automatic language identification for the Nordic languages, which often suffer miscategorisation by existing state-of-the-art tools. Concretely we will focus on discrimination between six Nordic languages: Danish, Swedish, Norwegian (Nynorsk), Norwegian (Bokm{\aa}l), Faroese and Icelandic.

PDF Abstract EACL (VarDial) 2021 PDF EACL (VarDial) 2021 Abstract

Datasets


Introduced in the Paper:

Nordic Language Identification
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Language Identification Nordic Language Identification FastText Accuracy 0.9711 # 1

Methods


No methods listed for this paper. Add relevant methods here