Search Results for author: Michiel Bacchiani

Found 11 papers, 2 papers with code

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

4 code implementations • 5 Dec 2017 • Chung-Cheng Chiu, Tara N. Sainath, Yonghui Wu, Rohit Prabhavalkar, Patrick Nguyen, Zhifeng Chen, Anjuli Kannan, Ron J. Weiss, Kanishka Rao, Ekaterina Gonina, Navdeep Jaitly, Bo Li, Jan Chorowski, Michiel Bacchiani

Attention-based encoder-decoder architectures such as Listen, Attend, and Spell (LAS), subsume the acoustic, pronunciation and language model components of a traditional automatic speech recognition (ASR) system into a single neural network.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

574

Paper
Code

Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations

1 code implementation • 3 Mar 2023 • Yuma Koizumi, Heiga Zen, Shigeki Karita, Yifan Ding, Kohei Yatabe, Nobuyuki Morioka, Yu Zhang, Wei Han, Ankur Bapna, Michiel Bacchiani

Experiments show that Miipher (i) is robust against various audio degradation and (ii) enable us to train a high-quality text-to-speech (TTS) model from restored speech samples collected from the Web.

Speech Denoising Speech Enhancement

Paper
Code

Toward domain-invariant speech recognition via large scale training

no code implementations • 16 Aug 2018 • Arun Narayanan, Ananya Misra, Khe Chai Sim, Golan Pundak, Anshuman Tripathi, Mohamed Elfeky, Parisa Haghani, Trevor Strohman, Michiel Bacchiani

More importantly, such models generalize better to unseen conditions and allow for rapid adaptation -- we show that by using as little as 10 hours of data from a new domain, an adapted domain-invariant model can match performance of a domain-specific model trained from scratch using 70 times as much data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

From Audio to Semantics: Approaches to end-to-end spoken language understanding

no code implementations • 24 Sep 2018 • Parisa Haghani, Arun Narayanan, Michiel Bacchiani, Galen Chuang, Neeraj Gaur, Pedro Moreno, Rohit Prabhavalkar, Zhongdi Qu, Austin Waters

Conventional spoken language understanding systems consist of two main components: an automatic speech recognition module that converts audio to a transcript, and a natural language understanding module that transforms the resulting text (or top N hypotheses) into a set of domains, intents, and arguments.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Multi-Dialect Speech Recognition With A Single Sequence-To-Sequence Model

no code implementations • 5 Dec 2017 • Bo Li, Tara N. Sainath, Khe Chai Sim, Michiel Bacchiani, Eugene Weinstein, Patrick Nguyen, Zhifeng Chen, Yonghui Wu, Kanishka Rao

Sequence-to-sequence models provide a simple and elegant solution for building speech recognition systems by folding separate components of a typical system, namely acoustic (AM), pronunciation (PM) and language (LM) models into a single neural network.

speech-recognition Speech Recognition

Paper
Add Code

DF-Conformer: Integrated architecture of Conv-TasNet and Conformer using linear complexity self-attention for speech enhancement

no code implementations • 30 Jun 2021 • Yuma Koizumi, Shigeki Karita, Scott Wisdom, Hakan Erdogan, John R. Hershey, Llion Jones, Michiel Bacchiani

To make the model computationally feasible, we extend the Conformer using linear complexity attention and stacked 1-D dilated depthwise convolution layers.

Computational Efficiency Denoising +1

Paper
Add Code

SNRi Target Training for Joint Speech Enhancement and Recognition

no code implementations • 1 Nov 2021 • Yuma Koizumi, Shigeki Karita, Arun Narayanan, Sankaran Panchapagesan, Michiel Bacchiani

Furthermore, by analyzing the predicted target SNRi, we observed the jointly trained network automatically controls the target SNRi according to noise characteristics.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Knowledge Transfer from Large-scale Pretrained Language Models to End-to-end Speech Recognizers

no code implementations • 16 Feb 2022 • Yotaro Kubo, Shigeki Karita, Michiel Bacchiani

Since embedding vectors can be assumed as implicit representations of linguistic information such as part-of-speech, intent, and so on, those are also expected to be useful modeling cues for ASR decoders.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive Noise Spectral Shaping

no code implementations • 31 Mar 2022 • Yuma Koizumi, Heiga Zen, Kohei Yatabe, Nanxin Chen, Michiel Bacchiani

Neural vocoder using denoising diffusion probabilistic model (DDPM) has been improved by adaptation of the diffusion noise distribution to given acoustic features.

Denoising Speech Enhancement

Paper
Add Code

WaveFit: An Iterative and Non-autoregressive Neural Vocoder based on Fixed-Point Iteration

no code implementations • 3 Oct 2022 • Yuma Koizumi, Kohei Yatabe, Heiga Zen, Michiel Bacchiani

The DDPMs and GANs can be characterized by the iterative denoising framework and adversarial training, respectively.

Denoising

Paper
Add Code

LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus

no code implementations • 30 May 2023 • Yuma Koizumi, Heiga Zen, Shigeki Karita, Yifan Ding, Kohei Yatabe, Nobuyuki Morioka, Michiel Bacchiani, Yu Zhang, Wei Han, Ankur Bapna

The constituent samples of LibriTTS-R are identical to those of LibriTTS, with only the sound quality improved.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.