About

Language identification is the task of determining the language of a text.

Benchmarks

TREND DATASET BEST METHOD PAPER TITLE PAPER CODE COMPARE

Subtasks

Datasets

Greatest papers with code

Common Voice: A Massively-Multilingual Speech Corpus

LREC 2020 facebookresearch/covost

To our knowledge this is the largest audio corpus in the public domain for speech recognition, both in terms of number of hours and number of languages.

LANGUAGE IDENTIFICATION SPEECH RECOGNITION TRANSFER LEARNING

Language Identification Using Deep Convolutional Recurrent Neural Networks

16 Aug 2017HPI-DeepLearning/crnn-lid

Language Identification (LID) systems are used to classify the spoken language from a given audio sample and are typically the first step for many spoken language processing tasks, such as Automatic Speech Recognition (ASR) systems.

LANGUAGE IDENTIFICATION SPEECH RECOGNITION SPOKEN LANGUAGE IDENTIFICATION

Speech-VGG: A deep feature extractor for speech processing

22 Oct 2019bepierre/SpeechVGG

While applications of transfer learning are common in the fields of computer vision and natural language processing, audio- and speech processing are surprisingly lacking readily available and transferable models.

LANGUAGE IDENTIFICATION MUSIC CLASSIFICATION REPRESENTATION LEARNING SPEAKER IDENTIFICATION SPEECH ENHANCEMENT TRANSFER LEARNING

A Semisupervised Approach for Language Identification based on Ladder Networks

1 Apr 2016udibr/LRE

In this study we address the problem of training a neuralnetwork for language identification using both labeled and unlabeled speech samples in the form of i-vectors.

DENOISING LANGUAGE IDENTIFICATION

Automatic Dialect Detection in Arabic Broadcast Speech

23 Sep 2015Qatar-Computing-Research-Institute/dialectID

We used these features in a binary classifier to discriminate between Modern Standard Arabic (MSA) and Dialectal Arabic, with an accuracy of 100%.

DIALECT IDENTIFICATION SPEECH RECOGNITION SPOKEN LANGUAGE IDENTIFICATION

OpusFilter: A Configurable Parallel Corpus Filtering Toolbox

ACL 2020 Helsinki-NLP/OpusFilter

We demonstrate the effectiveness of OpusFilter on the example of a Finnish-English news translation task based on noisy web-crawled training data.

DOMAIN ADAPTATION LANGUAGE IDENTIFICATION WORD ALIGNMENT

LanideNN: Multilingual Language Identification on Character Window

EACL 2017 tomkocmi/LanideNN

In language identification, a common first step in natural language processing, we want to automatically determine the language of some input text.

LANGUAGE IDENTIFICATION

On the End-to-End Solution to Mandarin-English Code-switching Speech Recognition

1 Nov 2018zengzp0912/SEAME-dev-set

Code-switching (CS) refers to a linguistic phenomenon where a speaker uses different languages in an utterance or between alternating utterances.

DATA AUGMENTATION LANGUAGE IDENTIFICATION LANGUAGE MODELLING SPEECH RECOGNITION

What's in a Domain? Learning Domain-Robust Text Representations using Adversarial Training

NAACL 2018 lrank/Domain_Robust_Text_Representation

Most real world language problems require learning from heterogenous corpora, raising the problem of learning robust models which generalise well to both similar (in domain) and dissimilar (out of domain) instances to those seen in training.

DOMAIN ADAPTATION LANGUAGE IDENTIFICATION SENTIMENT ANALYSIS

Hierarchical Character-Word Models for Language Identification

WS 2016 ajaech/twitter_langid

Social media messages' brevity and unconventional spelling pose a challenge to language identification.

LANGUAGE IDENTIFICATION