Speech-to-Text

128 papers with code • 1 benchmarks • 1 datasets

This task has no description! Would you like to contribute one?

Libraries

Use these libraries to find Speech-to-Text models and implementations

Most implemented papers

Joint CTC-Attention based End-to-End Speech Recognition using Multi-task Learning

PaddlePaddle/PaddleSpeech 21 Sep 2016

Recently, there has been an increasing interest in end-to-end speech recognition that directly transcribes speech to text without any predefined alignments.

Clotho: An Audio Captioning Dataset

labbeti/aac-datasets 21 Oct 2019

Audio captioning is the novel task of general audio content description using free text.

fairseq S2T: Fast Speech-to-Text Modeling with fairseq

pytorch/fairseq Asian Chapter of the Association for Computational Linguistics 2020

We introduce fairseq S2T, a fairseq extension for speech-to-text (S2T) modeling tasks such as end-to-end speech recognition and speech-to-text translation.

Audio Adversarial Examples: Targeted Attacks on Speech-to-Text

carlini/audio_adversarial_examples 5 Jan 2018

We construct targeted audio adversarial examples on automatic speech recognition.

Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions

facebookresearch/TensorComprehensions 13 Feb 2018

Deep learning models with convolutional and recurrent networks are now ubiquitous and analyze massive amounts of audio, image, video, text and graph data, with applications in automatic translation, speech-to-text, scene understanding, ranking user preferences, ad placement, etc.

Tools and resources for Romanian text-to-speech and speech-to-text applications

racai-ai/TEPROLIN 15 Feb 2018

In this paper we introduce a set of resources and tools aimed at providing support for natural language processing, text-to-speech synthesis and speech recognition for Romanian.

SeamlessM4T: Massively Multilingual & Multimodal Machine Translation

facebookresearch/seamless_communication 22 Aug 2023

What does it take to create the Babel Fish, a tool that can help individuals translate speech between any two languages?

Deep Reinforcement Learning For Sequence to Sequence Models

yaserkl/RLSeq2Seq 24 May 2018

In this survey, we consider seq2seq problems from the RL point of view and provide a formulation combining the power of RL methods in decision-making with sequence-to-sequence models that enable remembering long-term memories.

One TTS Alignment To Rule Them All

coqui-ai/TTS 23 Aug 2021

However, these alignments tend to be brittle and often fail to generalize to long utterances and out-of-domain text, leading to missing or repeating words.

Scribosermo: Fast Speech-to-Text models for German and other Languages

jaco-assistant/scribosermo 15 Oct 2021

Recent Speech-to-Text models often require a large amount of hardware resources and are mostly trained in English.