Speech-to-Text
128 papers with code • 1 benchmarks • 1 datasets
Benchmarks
These leaderboards are used to track progress in Speech-to-Text
Trend | Dataset | Best Model | Paper | Code | Compare |
---|
Libraries
Use these libraries to find Speech-to-Text models and implementationsMost implemented papers
Joint CTC-Attention based End-to-End Speech Recognition using Multi-task Learning
Recently, there has been an increasing interest in end-to-end speech recognition that directly transcribes speech to text without any predefined alignments.
Clotho: An Audio Captioning Dataset
Audio captioning is the novel task of general audio content description using free text.
fairseq S2T: Fast Speech-to-Text Modeling with fairseq
We introduce fairseq S2T, a fairseq extension for speech-to-text (S2T) modeling tasks such as end-to-end speech recognition and speech-to-text translation.
Audio Adversarial Examples: Targeted Attacks on Speech-to-Text
We construct targeted audio adversarial examples on automatic speech recognition.
Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions
Deep learning models with convolutional and recurrent networks are now ubiquitous and analyze massive amounts of audio, image, video, text and graph data, with applications in automatic translation, speech-to-text, scene understanding, ranking user preferences, ad placement, etc.
Tools and resources for Romanian text-to-speech and speech-to-text applications
In this paper we introduce a set of resources and tools aimed at providing support for natural language processing, text-to-speech synthesis and speech recognition for Romanian.
SeamlessM4T: Massively Multilingual & Multimodal Machine Translation
What does it take to create the Babel Fish, a tool that can help individuals translate speech between any two languages?
Deep Reinforcement Learning For Sequence to Sequence Models
In this survey, we consider seq2seq problems from the RL point of view and provide a formulation combining the power of RL methods in decision-making with sequence-to-sequence models that enable remembering long-term memories.
One TTS Alignment To Rule Them All
However, these alignments tend to be brittle and often fail to generalize to long utterances and out-of-domain text, leading to missing or repeating words.
Scribosermo: Fast Speech-to-Text models for German and other Languages
Recent Speech-to-Text models often require a large amount of hardware resources and are mostly trained in English.