Nordic Language Identification

Introduced by Haas et al. in Discriminating Between Similar Nordic Languages

Automatic language identification is a challenging problem. Discriminating between closely related languages is especially difficult. This paper presents a machine-learning approach for automatic language identification for the Nordic languages, which often suffer miscategorization by existing state-of-the-art tools. Concretely we will focus on discrimination between six Nordic languages: Danish, Swedish, Norwegian (Nynorsk), Norwegian (Bokmål), Faroese, and Icelandic. This is the data for the tasks. Two variants are provided: 10K and 50K, withholding 10,000 and 50,000 examples for each language respectively.

This dataset is in six similar Nordic languages: 1. Danish, da 2. Faroese, fo 3. Icelandic, is 4. Norwegian Bokmål, nb 5. Norwegian Nynorsk, nn 6. Swedish, sv

Homepage

Benchmarks

Add a new result Link an existing benchmark

Trend	Task	Dataset Variant	Best Model	Paper	Code
	Language Identification	Nordic Language Identification	FastText

Papers

Paper	Code	Results	Date	Stars

Dataset Loaders

Add Remove

huggingface/datasets

18,376

Nordic Language Identification

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Usage

License

Modalities

Languages

Nordic Language Identification

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Usage

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages