Phoneme Recognition
27 papers with code • 1 benchmarks • 1 datasets
Benchmarks
These leaderboards are used to track progress in Phoneme Recognition
Trend | Dataset | Best Model | Paper | Code | Compare |
---|
Libraries
Use these libraries to find Phoneme Recognition models and implementationsMost implemented papers
WaveNet: A Generative Model for Raw Audio
This paper introduces WaveNet, a deep neural network for generating raw audio waveforms.
Attention-Based Models for Speech Recognition
Recurrent sequence generators conditioned on input data through an attention mechanism have recently shown very good performance on a range of tasks in- cluding machine translation, handwriting synthesis and image caption gen- eration.
Sequence Transduction with Recurrent Neural Networks
One of the key challenges in sequence transduction is learning to represent both the input and output sequences in a way that is invariant to sequential distortions such as shrinking, stretching and translating.
Speech Recognition with Deep Recurrent Neural Networks
Recurrent neural networks (RNNs) are a powerful model for sequential data.
Do Deep Nets Really Need to be Deep?
Currently, deep neural networks are the state of the art on problems such as speech recognition and computer vision.
Simple and Effective Zero-shot Cross-lingual Phoneme Recognition
Recent progress in self-training, self-supervised pretraining and unsupervised learning enabled well performing speech recognition systems without any labeled data.
Improving Mispronunciation Detection with Wav2vec2-based Momentum Pseudo-Labeling for Accentedness and Intelligibility Assessment
We show that fine-tuning with pseudo labels achieves a 5. 35% phoneme error rate reduction and 2. 48% MDD F1 score improvement over a labeled-samples-only fine-tuning baseline.
End-to-end Phoneme Sequence Recognition using Convolutional Neural Networks
Most phoneme recognition state-of-the-art systems rely on a classical neural network classifiers, fed with highly tuned features, such as MFCC or PLP features.
Regularizing RNNs by Stabilizing Activations
We stabilize the activations of Recurrent Neural Networks (RNNs) by penalizing the squared distance between successive hidden states' norms.
Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks
Meanwhile, Connectionist Temporal Classification (CTC) with Recurrent Neural Networks (RNNs), which is proposed for labeling unsegmented sequences, makes it feasible to train an end-to-end speech recognition system instead of hybrid settings.