Language Identification

123 papers with code • 6 benchmarks • 19 datasets

Language identification is the task of determining the language of a text.

Benchmarks

Add a Result

These leaderboards are used to track progress in Language Identification

Dataset	Best Model	Compare
VoxLingua107	XLS-R	See all
OpenSubtitles	Apple bi-LSTM	See all
Universal Dependencies	Apple bi-LSTM	See all
Nordic Language Identification	FastText	See all
GlotLID-C	GlotLID	See all
VoxForge	ConformerG-P	See all

Libraries

Use these libraries to find Language Identification models and implementations

facebookresearch/fairseq

2 papers

29,193

pytorch/fairseq

2 papers

29,192

Datasets

Subtasks

Latest papers with no code

Most implemented Social Latest No code

A Federated Learning Approach to Privacy Preserving Offensive Language Identification

no code yet • 17 Apr 2024

Since most social media data originates from end users, we propose a privacy preserving decentralized architecture for identifying offensive language online by introducing Federated Learning (FL) in the context of offensive language identification.

Paper
Add Code

FastSpell: the LangId Magic Spell

no code yet • 12 Apr 2024

Language identification is a crucial component in the automated production of language resources, particularly in multilingual and big data contexts.

Paper
Add Code

More than words: Advancements and challenges in speech recognition for singing

no code yet • 14 Mar 2024

This paper addresses the challenges and advancements in speech recognition for singing, a domain distinctly different from standard speech recognition.

Paper
Add Code

Validating and Exploring Large Geographic Corpora

no code yet • 13 Mar 2024

The goal is to understand the impact of upstream data cleaning decisions on downstream corpora with a specific focus on under-represented languages and populations.

Paper
Add Code

Aligning Speech to Languages to Enhance Code-switching Speech Recognition

no code yet • 9 Mar 2024

Performance evaluation using large language models reveals the advantage of the linguistic hint by achieving 14. 1% and 5. 5% relative improvement on test sets of the ASRU and SEAME datasets, respectively.

Paper
Add Code

OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification

no code yet • 20 Feb 2024

Inspired by the Open Whisper-style Speech Model (OWSM) project, we propose OWSM-CTC, a novel encoder-only speech foundation model based on Connectionist Temporal Classification (CTC).

Paper
Add Code

Detecting Structured Language Alternations in Historical Documents by Combining Language Identification with Fourier Analysis

no code yet • 25 Jan 2024

In this study, we present a generalizable workflow to identify documents in a historic language with a nonstandard language and script combination, Armeno-Turkish.

Paper
Add Code

Acoustic characterization of speech rhythm: going beyond metrics with recurrent neural networks

no code yet • 22 Jan 2024

We argue that deep learning offers a powerful pattern-recognition approach to advance the characterization of the acoustic bases of speech rhythm.

Paper
Add Code

Language Detection for Transliterated Content

no code yet • 9 Jan 2024

The comprehensive exploration of transliteration dynamics supported by innovative approaches and cutting edge technologies like BERT, positions our research at the forefront of addressing unique challenges in the linguistic landscape of digital communication.

Paper
Add Code

Generative linguistic representation for spoken language identification

no code yet • 18 Dec 2023

Effective extraction and application of linguistic features are central to the enhancement of spoken Language IDentification (LID) performance.

Paper
Add Code

Language Identification

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers with no code

Content

Benchmarks

Add a Result