Browse > Speech > Speech Recognition

Speech Recognition

119 papers with code · Speech

Speech recognition is the task of recognising speech within audio and converting it into text.

State-of-the-art leaderboards

Greatest papers with code

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

8 Dec 2015tensorflow/models

We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech--two vastly different languages. Key to our approach is our application of HPC techniques, resulting in a 7x speedup over our previous system.

ACCENTED SPEECH RECOGNITION END-TO-END SPEECH RECOGNITION NOISY SPEECH RECOGNITION

Deep Speech: Scaling up end-to-end speech recognition

17 Dec 2014mozilla/DeepSpeech

We present a state-of-the-art speech recognition system developed using end-to-end deep learning. Our architecture is significantly simpler than traditional speech systems, which rely on laboriously engineered processing pipelines; these traditional systems also tend to perform poorly when used in noisy environments.

ACCENTED SPEECH RECOGNITION END-TO-END SPEECH RECOGNITION

End-to-end speech recognition using lattice-free MMI

Interspeech 2018 2018 kaldi-asr/kaldi

We present our work on end-to-end training of acoustic models using the lattice-free maximum mutual information (LF-MMI) objective function in the context of hidden Markov models. By end-to-end training, we mean flat-start training of a single DNN in one stage without using any previously trained models, forced alignments, or building state-tying decision trees.

END-TO-END SPEECH RECOGNITION SPEECH RECOGNITION

Building state-of-the-art distant speech recognition using the CHiME-4 challenge with a setup of speech enhancement baseline

27 Mar 2018kaldi-asr/kaldi

This paper describes a new baseline system for automatic speech recognition (ASR) in the CHiME-4 challenge to promote the development of noisy ASR in speech processing communities by providing 1) state-of-the-art system with a simplified single system comparable to the complicated top systems in the challenge, 2) publicly available and reproducible recipe through the main repository in the Kaldi speech recognition toolkit. In addition, the proposed baseline recipe includes four different speech enhancement measures, short-time objective intelligibility measure (STOI), extended STOI (eSTOI), perceptual evaluation of speech quality (PESQ) and speech distortion ratio (SDR) for the simulation test set.

DISTANT SPEECH RECOGNITION LANGUAGE MODELLING NOISY SPEECH RECOGNITION SPEECH ENHANCEMENT

First-Pass Large Vocabulary Continuous Speech Recognition using Bi-Directional Recurrent DNNs

12 Aug 2014baidu-research/warp-ctc

Recent work demonstrated the feasibility of discarding the HMM sequence modeling framework by directly predicting transcript text from audio. This approach to decoding enables first-pass speech recognition with a language model, completely unaided by the cumbersome infrastructure of HMM-based systems.

LANGUAGE MODELLING LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION SPEECH RECOGNITION

wav2letter++: The Fastest Open-source Speech Recognition System

18 Dec 2018facebookresearch/wav2letter

This paper introduces wav2letter++, the fastest open-source deep learning speech recognition framework. wav2letter++ is written entirely in C++, and uses the ArrayFire tensor library for maximum efficiency.

SPEECH RECOGNITION

Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces

25 May 2018snipsco/snips-nlu

This paper presents the machine learning architecture of the Snips Voice Platform, a software solution to perform Spoken Language Understanding on microprocessors typical of IoT devices. The embedded inference is fast and accurate while enforcing privacy by design, as no personal user data is ever collected.

SPOKEN LANGUAGE UNDERSTANDING

3D Convolutional Neural Networks for Cross Audio-Visual Matching Recognition

18 Jun 2017astorfi/lip-reading-deeplearning

Audio-visual recognition (AVR) has been considered as a solution for speech recognition tasks when the audio is corrupted, as well as a visual recognition method used for speaker verification in multi-speaker scenarios. We propose the use of a coupled 3D Convolutional Neural Network (3D-CNN) architecture that can map both modalities into a representation space to evaluate the correspondence of audio-visual streams using the learned multimodal features.

SPEAKER VERIFICATION SPEECH RECOGNITION

Interpretable Convolutional Filters with SincNet

23 Nov 2018mravanelli/pytorch-kaldi

Deep learning is currently playing a crucial role toward higher levels of artificial intelligence. This paradigm allows neural networks to learn complex and abstract representations, that are progressively obtained by combining simpler ones.

SPEECH RECOGNITION

The PyTorch-Kaldi Speech Recognition Toolkit

19 Nov 2018mravanelli/pytorch-kaldi

The PyTorch-Kaldi project aims to bridge the gap between these popular toolkits, trying to inherit the efficiency of Kaldi and the flexibility of PyTorch. Experiments, that are conducted on several datasets and tasks, show that PyTorch-Kaldi can effectively be used to develop modern state-of-the-art speech recognizers.

NOISY SPEECH RECOGNITION