Language Identification
123 papers with code • 6 benchmarks • 19 datasets
Language identification is the task of determining the language of a text.
Libraries
Use these libraries to find Language Identification models and implementationsDatasets
Latest papers with no code
Cross-Linguistic Offensive Language Detection: BERT-Based Analysis of Bengali, Assamese, & Bodo Conversational Hateful Content from Social Media
The primary emphasis is placed on the meticulous detection of hate speech within the linguistic domains of Bengali, Assamese, and Bodo, forming the framework for Task 4: Annihilate Hates.
Leveraging Language ID to Calculate Intermediate CTC Loss for Enhanced Code-Switching Speech Recognition
In recent years, end-to-end speech recognition has emerged as a technology that integrates the acoustic, pronunciation dictionary, and language model components of the traditional Automatic Speech Recognition model.
Attention-Guided Adaptation for Code-Switching Speech Recognition
The prevalence of the powerful multilingual models, such as Whisper, has significantly advanced the researches on speech recognition.
Native Language Identification with Large Language Models
We present the first experiments on Native Language Identification (NLI) using LLMs such as GPT-4.
Self-supervised Adaptive Pre-training of Multilingual Speech Models for Language and Dialect Identification
To address this challenge, we propose self-supervised adaptive pre-training (SAPT) to adapt the pre-trained model to the target domain and languages of the downstream task.
A Text-to-Text Model for Multilingual Offensive Language Identification
Following a similar approach, we also train the first multilingual pre-trained model for offensive language identification using mT5 and evaluate its performance on a set of six different languages (German, Hindi, Korean, Marathi, Sinhala, and Spanish).
Offensive Language Identification in Transliterated and Code-Mixed Bangla
Identifying offensive content in social media is vital for creating safe online communities.
The Obscure Limitation of Modular Multilingual Language Models
We expose the limitation of modular multilingual language models (MLMs) in multilingual inference scenarios with unknown languages.
Fumbling in Babel: An Investigation into ChatGPT's Language Identification Ability
However, the range of languages ChatGPT can handle remains largely a mystery.
Advanced accent/dialect identification and accentedness assessment with multi-embedding models and automatic speech recognition
In this study, embeddings from advanced pre-trained language identification (LID) and speaker identification (SID) models are leveraged to improve the accuracy of accent classification and non-native accentedness assessment.