Spoken language identification
9 papers with code • 12 benchmarks • 3 datasets
Identify the language being spoken from an audio input only.
Most implemented papers
VoxLingua107: a Dataset for Spoken Language Recognition
Speech activity detection and speaker diarization are used to extract segments from the videos that contain speech.
Automatic Dialect Detection in Arabic Broadcast Speech
We used these features in a binary classifier to discriminate between Modern Standard Arabic (MSA) and Dialectal Arabic, with an accuracy of 100%.
Language Identification Using Deep Convolutional Recurrent Neural Networks
Language Identification (LID) systems are used to classify the spoken language from a given audio sample and are typically the first step for many spoken language processing tasks, such as Automatic Speech Recognition (ASR) systems.
Cross-Domain Adaptation of Spoken Language Identification for Related Languages: The Curious Case of Slavic Languages
State-of-the-art spoken language identification (LID) systems, which are based on end-to-end deep neural networks, have shown remarkable success not only in discriminating between distant languages but also between closely-related languages or even different spoken varieties of the same language.
Triplet Entropy Loss: Improving The Generalisation of Short Speech Language Identification Systems
Even though the models trained using Triplet Entropy Loss showed a better understanding of the languages and higher accuracies, it appears as though the models still memorise word patterns present in the spectrograms rather than learning the finer nuances of a language.
BERT-LID: Leveraging BERT to Improve Spoken Language Identification
It has a profound impact on the multilingual interoperability of an intelligent speech system.
Improving Spoken Language Identification with Map-Mix
The pre-trained multi-lingual XLSR model generalizes well for language identification after fine-tuning on unseen languages.
Spoken Language Identification System for English-Mandarin Code-Switching Child-Directed Speech
This work focuses on improving the Spoken Language Identification (LangId) system for a challenge that focuses on developing robust language identification systems that are reliable for non-standard, accented (Singaporean accent), spontaneous code-switched, and child-directed speech collected via Zoom.
Unified model for code-switching speech recognition and language identification based on a concatenated tokenizer
Code-Switching (CS) multilingual Automatic Speech Recognition (ASR) models can transcribe speech containing two or more alternating languages during a conversation.