Language Identification

123 papers with code • 6 benchmarks • 19 datasets

Language identification is the task of determining the language of a text.

Benchmarks

Add a Result

These leaderboards are used to track progress in Language Identification

Dataset	Best Model	Compare
VoxLingua107	XLS-R	See all
OpenSubtitles	Apple bi-LSTM	See all
Universal Dependencies	Apple bi-LSTM	See all
Nordic Language Identification	FastText	See all
GlotLID-C	GlotLID	See all
VoxForge	ConformerG-P	See all

Libraries

Use these libraries to find Language Identification models and implementations

facebookresearch/fairseq

2 papers

29,288

pytorch/fairseq

2 papers

29,286

Datasets

Subtasks

Latest papers with no code

Most implemented Social Latest No code

Cross-Linguistic Offensive Language Detection: BERT-Based Analysis of Bengali, Assamese, & Bodo Conversational Hateful Content from Social Media

no code yet • 16 Dec 2023

The primary emphasis is placed on the meticulous detection of hate speech within the linguistic domains of Bengali, Assamese, and Bodo, forming the framework for Task 4: Annihilate Hates.

Paper
Add Code

Leveraging Language ID to Calculate Intermediate CTC Loss for Enhanced Code-Switching Speech Recognition

no code yet • 15 Dec 2023

In recent years, end-to-end speech recognition has emerged as a technology that integrates the acoustic, pronunciation dictionary, and language model components of the traditional Automatic Speech Recognition model.

Paper
Add Code

Attention-Guided Adaptation for Code-Switching Speech Recognition

no code yet • 14 Dec 2023

The prevalence of the powerful multilingual models, such as Whisper, has significantly advanced the researches on speech recognition.

Paper
Add Code

Native Language Identification with Large Language Models

no code yet • 13 Dec 2023

We present the first experiments on Native Language Identification (NLI) using LLMs such as GPT-4.

Paper
Add Code

Self-supervised Adaptive Pre-training of Multilingual Speech Models for Language and Dialect Identification

no code yet • 12 Dec 2023

To address this challenge, we propose self-supervised adaptive pre-training (SAPT) to adapt the pre-trained model to the target domain and languages of the downstream task.

Paper
Add Code

A Text-to-Text Model for Multilingual Offensive Language Identification

no code yet • 6 Dec 2023

Following a similar approach, we also train the first multilingual pre-trained model for offensive language identification using mT5 and evaluate its performance on a set of six different languages (German, Hindi, Korean, Marathi, Sinhala, and Spanish).

Paper
Add Code

Offensive Language Identification in Transliterated and Code-Mixed Bangla

no code yet • 25 Nov 2023

Identifying offensive content in social media is vital for creating safe online communities.

Paper
Add Code

The Obscure Limitation of Modular Multilingual Language Models

no code yet • 21 Nov 2023

We expose the limitation of modular multilingual language models (MLMs) in multilingual inference scenarios with unknown languages.

Paper
Add Code

Fumbling in Babel: An Investigation into ChatGPT's Language Identification Ability

no code yet • 16 Nov 2023

However, the range of languages ChatGPT can handle remains largely a mystery.

Paper
Add Code

Advanced accent/dialect identification and accentedness assessment with multi-embedding models and automatic speech recognition

no code yet • 17 Oct 2023

In this study, embeddings from advanced pre-trained language identification (LID) and speaker identification (SID) models are leveraged to improve the accuracy of accent classification and non-native accentedness assessment.

Paper
Add Code

Language Identification

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers with no code

Content

Benchmarks

Add a Result