Language Identification
108 papers with code • 4 benchmarks • 15 datasets
Language identification is the task of determining the language of a text.
Libraries
Use these libraries to find Language Identification models and implementationsDatasets
Most implemented papers
The WiLI benchmark dataset for written language identification
This paper describes the WiLI-2018 benchmark dataset for monolingual written natural language identification.
SpeechBrain: A General-Purpose Speech Toolkit
SpeechBrain is an open-source and all-in-one speech toolkit.
Universal Dependency Parsing for Hindi-English Code-switching
We present a treebank of Hindi-English code-switching tweets under Universal Dependencies scheme and propose a neural stacking model for parsing that efficiently leverages part-of-speech tag and syntactic tree annotations in the code-switching treebank and the preexisting Hindi and English treebanks.
Predicting the Type and Target of Offensive Posts in Social Media
In particular, we model the task hierarchically, identifying the type and the target of offensive messages in social media.
Word-level Embeddings for Cross-Task Transfer Learning in Speech Processing
Recent breakthroughs in deep learning often rely on representation learning and knowledge transfer.
Common Voice: A Massively-Multilingual Speech Corpus
To our knowledge this is the largest audio corpus in the public domain for speech recognition, both in terms of number of hours and number of languages.
VoxLingua107: a Dataset for Spoken Language Recognition
Speech activity detection and speaker diarization are used to extract segments from the videos that contain speech.
Discriminating Between Similar Nordic Languages
Automatic language identification is a challenging problem.
XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale
On the CoVoST-2 speech translation benchmark, we improve the previous state of the art by an average of 7. 4 BLEU over 21 translation directions into English.
Scaling Speech Technology to 1,000+ Languages
Expanding the language coverage of speech technology has the potential to improve access to information for many more people.