Language Identification

108 papers with code • 4 benchmarks • 15 datasets

Language identification is the task of determining the language of a text.


Use these libraries to find Language Identification models and implementations
2 papers

Most implemented papers

The WiLI benchmark dataset for written language identification

birolkuyumcu/language_identification 23 Jan 2018

This paper describes the WiLI-2018 benchmark dataset for monolingual written natural language identification.

SpeechBrain: A General-Purpose Speech Toolkit

speechbrain/speechbrain 8 Jun 2021

SpeechBrain is an open-source and all-in-one speech toolkit.

Universal Dependency Parsing for Hindi-English Code-switching

irshadbhat/nsdp-cs NAACL 2018

We present a treebank of Hindi-English code-switching tweets under Universal Dependencies scheme and propose a neural stacking model for parsing that efficiently leverages part-of-speech tag and syntactic tree annotations in the code-switching treebank and the preexisting Hindi and English treebanks.

Predicting the Type and Target of Offensive Posts in Social Media

idontflow/olid NAACL 2019

In particular, we model the task hierarchically, identifying the type and the target of offensive messages in social media.

Word-level Embeddings for Cross-Task Transfer Learning in Speech Processing

bepierre/SpeechVGG 22 Oct 2019

Recent breakthroughs in deep learning often rely on representation learning and knowledge transfer.

Common Voice: A Massively-Multilingual Speech Corpus

facebookresearch/covost LREC 2020

To our knowledge this is the largest audio corpus in the public domain for speech recognition, both in terms of number of hours and number of languages.

VoxLingua107: a Dataset for Spoken Language Recognition

alumae/torch-xvectors-wav 25 Nov 2020

Speech activity detection and speaker diarization are used to extract segments from the videos that contain speech.

XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale

pytorch/fairseq 17 Nov 2021

On the CoVoST-2 speech translation benchmark, we improve the previous state of the art by an average of 7. 4 BLEU over 21 translation directions into English.

Scaling Speech Technology to 1,000+ Languages

facebookresearch/fairseq arXiv 2023

Expanding the language coverage of speech technology has the potential to improve access to information for many more people.