Search Results for author: Shigeki Karita

Found 11 papers, 3 papers with code

Lenient Evaluation of Japanese Speech Recognition: Modeling Naturally Occurring Spelling Inconsistency

no code implementations • 7 Jun 2023 • Shigeki Karita, Richard Sproat, Haruko Ishikawa

Word error rate (WER) and character error rate (CER) are standard metrics in Speech Recognition (ASR), but one problem has always been alternative spellings: If one's system transcribes adviser whereas the ground truth has advisor, this will count as an error even though the two spellings really represent the same word.

Machine Translation speech-recognition +2

Paper
Add Code

LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus

no code implementations • 30 May 2023 • Yuma Koizumi, Heiga Zen, Shigeki Karita, Yifan Ding, Kohei Yatabe, Nobuyuki Morioka, Michiel Bacchiani, Yu Zhang, Wei Han, Ankur Bapna

The constituent samples of LibriTTS-R are identical to those of LibriTTS, with only the sound quality improved.

Paper
Add Code

Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations

1 code implementation • 3 Mar 2023 • Yuma Koizumi, Heiga Zen, Shigeki Karita, Yifan Ding, Kohei Yatabe, Nobuyuki Morioka, Yu Zhang, Wei Han, Ankur Bapna, Michiel Bacchiani

Experiments show that Miipher (i) is robust against various audio degradation and (ii) enable us to train a high-quality text-to-speech (TTS) model from restored speech samples collected from the Web.

Speech Denoising Speech Enhancement

Paper
Code

Knowledge Transfer from Large-scale Pretrained Language Models to End-to-end Speech Recognizers

no code implementations • 16 Feb 2022 • Yotaro Kubo, Shigeki Karita, Michiel Bacchiani

Since embedding vectors can be assumed as implicit representations of linguistic information such as part-of-speech, intent, and so on, those are also expected to be useful modeling cues for ASR decoders.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

SNRi Target Training for Joint Speech Enhancement and Recognition

no code implementations • 1 Nov 2021 • Yuma Koizumi, Shigeki Karita, Arun Narayanan, Sankaran Panchapagesan, Michiel Bacchiani

Furthermore, by analyzing the predicted target SNRi, we observed the jointly trained network automatically controls the target SNRi according to noise characteristics.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

DF-Conformer: Integrated architecture of Conv-TasNet and Conformer using linear complexity self-attention for speech enhancement

no code implementations • 30 Jun 2021 • Yuma Koizumi, Shigeki Karita, Scott Wisdom, Hakan Erdogan, John R. Hershey, Llion Jones, Michiel Bacchiani

To make the model computationally feasible, we extend the Conformer using linear complexity attention and stacked 1-D dilated depthwise convolution layers.

Computational Efficiency Denoising +1

Paper
Add Code

A Comparative Study on Neural Architectures and Training Methods for Japanese Speech Recognition

no code implementations • 9 Jun 2021 • Shigeki Karita, Yotaro Kubo, Michiel Adriaan Unico Bacchiani, Llion Jones

End-to-end (E2E) modeling is advantageous for automatic speech recognition (ASR) especially for Japanese since word-based tokenization of Japanese is not trivial, and E2E modeling is able to model character sequences directly.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Unsupervised Learning of Disentangled Speech Content and Style Representation

no code implementations • 24 Oct 2020 • Andros Tjandra, Ruoming Pang, Yu Zhang, Shigeki Karita

We present an approach for unsupervised learning of speech representation disentangling contents and styles.

Speaker Recognition

Paper
Add Code

ESPnet-ST: All-in-One Speech Translation Toolkit

1 code implementation • ACL 2020 • Hirofumi Inaguma, Shun Kiyono, Kevin Duh, Shigeki Karita, Nelson Enrique Yalta Soplin, Tomoki Hayashi, Shinji Watanabe

We present ESPnet-ST, which is designed for the quick development of speech-to-speech translation systems in a single framework.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

7,845

Paper
Code

A Comparative Study on Transformer vs RNN in Speech Applications

1 code implementation • 13 Sep 2019 • Shigeki Karita, Nanxin Chen, Tomoki Hayashi, Takaaki Hori, Hirofumi Inaguma, Ziyan Jiang, Masao Someki, Nelson Enrique Yalta Soplin, Ryuichi Yamamoto, Xiaofei Wang, Shinji Watanabe, Takenori Yoshimura, Wangyou Zhang

Sequence-to-sequence models have been widely used in end-to-end speech processing, for example, automatic speech recognition (ASR), speech translation (ST), and text-to-speech (TTS).

Ranked #12 on Speech Recognition on AISHELL-1

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

7,845

Paper
Code

ESPnet: End-to-End Speech Processing Toolkit

no code implementations • 30 Mar 2018 • Shinji Watanabe, Takaaki Hori, Shigeki Karita, Tomoki Hayashi, Jiro Nishitoba, Yuya Unno, Nelson Enrique Yalta Soplin, Jahn Heymann, Matthew Wiesner, Nanxin Chen, Adithya Renduchintala, Tsubasa Ochiai

This paper introduces a new open source platform for end-to-end speech processing named ESPnet.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.