Acoustic Modelling
10 papers with code • 0 benchmarks • 0 datasets
Benchmarks
These leaderboards are used to track progress in Acoustic Modelling
Latest papers with no code
An overview of text-to-speech systems and media applications
Producing synthetic voice, similar to human-like sound, is an emerging novelty of modern interactive media systems.
Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech
Neural text-to-speech systems are often optimized on L1/L2 losses, which make strong assumptions about the distributions of the target data space.
Speaker- and Age-Invariant Training for Child Acoustic Modeling Using Adversarial Multi-Task Learning
These high acoustic variations along with the scarcity of child speech corpora have impeded the development of a reliable speech recognition system for children.
Impact of Dataset on Acoustic Models for Automatic Speech Recognition
The GMM models are widely used to create the alignments of the training data for the hybrid deep neural network model, thus making it an important task to create accurate alignments.
Investigation of Deep Neural Network Acoustic Modelling Approaches for Low Resource Accented Mandarin Speech Recognition
Meanwhile, approaches of multi-accent modelling including multi-style training, multi-accent decision tree state tying, DNN tandem and multi-level adaptive network (MLAN) tandem hidden Markov model (HMM) modelling are combined and compared in this paper.
Common Phone: A Multilingual Dataset for Robust Acoustic Modelling
A Wav2Vec 2. 0 acoustic model was trained with the Common Phone to perform phonetic symbol recognition and validate the quality of the generated phonetic annotation.
Enhancing audio quality for expressive Neural Text-to-Speech
Artificial speech synthesis has made a great leap in terms of naturalness as recent Text-to-Speech (TTS) systems are capable of producing speech with similar quality to human recordings.
Low Resource German ASR with Untranscribed Data Spoken by Non-native Children -- INTERSPEECH 2021 Shared Task SPAPL System
~ 5 hours of transcribed data and ~ 60 hours of untranscribed data are provided to develop a German ASR system for children.
End-to-end acoustic modelling for phone recognition of young readers
Through transfer learning, a Transformer model complemented with a Connectionist Temporal Classification (CTC) objective function, reaches a phone error rate of 28. 1%, outperforming a state-of-the-art DNN-HMM model by 6. 6% relative, as well as other end-to-end architectures by more than 8. 5% relative.
A comparative study of two-dimensional vocal tract acoustic modeling based on Finite-Difference Time-Domain methods
The two-dimensional (2D) numerical approaches for vocal tract (VT) modelling can afford a better balance between the low computational cost and accurate rendering of acoustic wave propagation.