Search Results for author: Yerbolat Khassanov

Found 17 papers, 10 papers with code

Multilingual Text-to-Speech Synthesis for Turkic Languages Using Transliteration

1 code implementation25 May 2023 Rustem Yeshpanov, Saida Mussakhojayeva, Yerbolat Khassanov

This work aims to build a multilingual text-to-speech (TTS) synthesis system for ten lower-resourced Turkic languages: Azerbaijani, Bashkir, Kazakh, Kyrgyz, Sakha, Tatar, Turkish, Turkmen, Uyghur, and Uzbek.

Speech Synthesis Text-To-Speech Synthesis +2

Random Utterance Concatenation Based Data Augmentation for Improving Short-video Speech Recognition

no code implementations28 Oct 2022 Yist Y. Lin, Tao Han, HaiHua Xu, Van Tung Pham, Yerbolat Khassanov, Tze Yuang Chong, Yi He, Lu Lu, Zejun Ma

One of limitations in end-to-end automatic speech recognition (ASR) framework is its performance would be compromised if train-test utterance lengths are mismatched.

Action Detection Activity Detection +4

A Study of Multimodal Person Verification Using Audio-Visual-Thermal Data

1 code implementation23 Oct 2021 Madina Abdrakhmanova, Saniya Abushakimova, Yerbolat Khassanov, Huseyin Atakan Varol

In this paper, we study an approach to multimodal person verification using audio, visual, and thermal modalities.

USC: An Open-Source Uzbek Speech Corpus and Initial Speech Recognition Experiments

1 code implementation30 Jul 2021 Muhammadjon Musaev, Saida Mussakhojayeva, Ilyos Khujayorov, Yerbolat Khassanov, Mannon Ochilov, Huseyin Atakan Varol

We present a freely available speech corpus for the Uzbek language and report preliminary automatic speech recognition (ASR) results using both the deep neural network hidden Markov model (DNN-HMM) and end-to-end (E2E) architectures.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

KazakhTTS: An Open-Source Kazakh Text-to-Speech Synthesis Dataset

1 code implementation17 Apr 2021 Saida Mussakhojayeva, Aigerim Janaliyeva, Almas Mirzakhmetov, Yerbolat Khassanov, Huseyin Atakan Varol

This paper introduces a high-quality open-source speech synthesis dataset for Kazakh, a low-resource language spoken by over 13 million people worldwide.

Speech Synthesis Text-To-Speech Synthesis

SpeakingFaces: A Large-Scale Multimodal Dataset of Voice Commands with Visual and Thermal Video Streams

1 code implementation5 Dec 2020 Madina Abdrakhmanova, Askat Kuzdeuov, Sheikh Jarju, Yerbolat Khassanov, Michael Lewis, Huseyin Atakan Varol

We present SpeakingFaces as a publicly-available large-scale multimodal dataset developed to support machine learning research in contexts that utilize a combination of thermal, visual, and audio data streams; examples include human-computer interaction, biometric authentication, recognition systems, domain transfer, and speech recognition.

speech-recognition Speech Recognition +1

Approaches to Improving Recognition of Underrepresented Named Entities in Hybrid ASR Systems

no code implementations18 May 2020 Tingzhi Mao, Yerbolat Khassanov, Van Tung Pham, Hai-Hua Xu, Hao Huang, Eng Siong Chng

In this paper, we present a series of complementary approaches to improve the recognition of underrepresented named entities (NE) in hybrid ASR systems without compromising overall word error rate performance.

Language Modelling

Enriching Rare Word Representations in Neural Language Models by Embedding Matrix Augmentation

1 code implementation8 Apr 2019 Yerbolat Khassanov, Zhiping Zeng, Van Tung Pham, Hai-Hua Xu, Eng Siong Chng

However, learning the representation of rare words is a challenging problem causing the NLM to produce unreliable probability estimates.

speech-recognition Speech Recognition

On the End-to-End Solution to Mandarin-English Code-switching Speech Recognition

1 code implementation1 Nov 2018 Zhiping Zeng, Yerbolat Khassanov, Van Tung Pham, Hai-Hua Xu, Eng Siong Chng, Haizhou Li

Code-switching (CS) refers to a linguistic phenomenon where a speaker uses different languages in an utterance or between alternating utterances.

Data Augmentation Language Identification +3

Unsupervised and Efficient Vocabulary Expansion for Recurrent Neural Network Language Models in ASR

no code implementations27 Jun 2018 Yerbolat Khassanov, Eng Siong Chng

Additionally, we propose to generate the list of OOS words to expand vocabulary in unsupervised manner by automatically extracting them from ASR output.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Cannot find the paper you are looking for? You can Submit a new open access paper.