Language Identification

123 papers with code • 6 benchmarks • 19 datasets

Language identification is the task of determining the language of a text.

Libraries

Use these libraries to find Language Identification models and implementations
2 papers
29,286

Latest papers with no code

Cross-Linguistic Offensive Language Detection: BERT-Based Analysis of Bengali, Assamese, & Bodo Conversational Hateful Content from Social Media

no code yet • 16 Dec 2023

The primary emphasis is placed on the meticulous detection of hate speech within the linguistic domains of Bengali, Assamese, and Bodo, forming the framework for Task 4: Annihilate Hates.

Leveraging Language ID to Calculate Intermediate CTC Loss for Enhanced Code-Switching Speech Recognition

no code yet • 15 Dec 2023

In recent years, end-to-end speech recognition has emerged as a technology that integrates the acoustic, pronunciation dictionary, and language model components of the traditional Automatic Speech Recognition model.

Attention-Guided Adaptation for Code-Switching Speech Recognition

no code yet • 14 Dec 2023

The prevalence of the powerful multilingual models, such as Whisper, has significantly advanced the researches on speech recognition.

Native Language Identification with Large Language Models

no code yet • 13 Dec 2023

We present the first experiments on Native Language Identification (NLI) using LLMs such as GPT-4.

Self-supervised Adaptive Pre-training of Multilingual Speech Models for Language and Dialect Identification

no code yet • 12 Dec 2023

To address this challenge, we propose self-supervised adaptive pre-training (SAPT) to adapt the pre-trained model to the target domain and languages of the downstream task.

A Text-to-Text Model for Multilingual Offensive Language Identification

no code yet • 6 Dec 2023

Following a similar approach, we also train the first multilingual pre-trained model for offensive language identification using mT5 and evaluate its performance on a set of six different languages (German, Hindi, Korean, Marathi, Sinhala, and Spanish).

Offensive Language Identification in Transliterated and Code-Mixed Bangla

no code yet • 25 Nov 2023

Identifying offensive content in social media is vital for creating safe online communities.

The Obscure Limitation of Modular Multilingual Language Models

no code yet • 21 Nov 2023

We expose the limitation of modular multilingual language models (MLMs) in multilingual inference scenarios with unknown languages.

Fumbling in Babel: An Investigation into ChatGPT's Language Identification Ability

no code yet • 16 Nov 2023

However, the range of languages ChatGPT can handle remains largely a mystery.

Advanced accent/dialect identification and accentedness assessment with multi-embedding models and automatic speech recognition

no code yet • 17 Oct 2023

In this study, embeddings from advanced pre-trained language identification (LID) and speaker identification (SID) models are leveraged to improve the accuracy of accent classification and non-native accentedness assessment.