Search Results for author: Yosuke Higuchi

Found 9 papers, 1 papers with code

An Investigation of Enhancing CTC Model for Triggered Attention-based Streaming ASR

no code implementations20 Oct 2021 Huaibo Zhao, Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi

In the present paper, an attempt is made to combine Mask-CTC and the triggered attention mechanism to construct a streaming end-to-end automatic speech recognition (ASR) system that provides high performance with low latency.

automatic-speech-recognition Speech Recognition

Advancing Momentum Pseudo-Labeling with Conformer and Initialization Strategy

no code implementations11 Oct 2021 Yosuke Higuchi, Niko Moritz, Jonathan Le Roux, Takaaki Hori

Pseudo-labeling (PL), a semi-supervised learning (SSL) method where a seed model performs self-training using pseudo-labels generated from untranscribed speech, has been shown to enhance the performance of end-to-end automatic speech recognition (ASR).

automatic-speech-recognition Language Modelling +1

A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation

no code implementations11 Oct 2021 Yosuke Higuchi, Nanxin Chen, Yuya Fujita, Hirofumi Inaguma, Tatsuya Komatsu, Jaesong Lee, Jumon Nozaki, Tianzi Wang, Shinji Watanabe

Non-autoregressive (NAR) models simultaneously generate multiple outputs in a sequence, which significantly reduces the inference speed at the cost of accuracy drop compared to autoregressive baselines.

automatic-speech-recognition Speech Recognition +2

Hierarchical Conditional End-to-End ASR with CTC and Multi-Granular Subword Units

1 code implementation8 Oct 2021 Yosuke Higuchi, Keita Karube, Tetsuji Ogawa, Tetsunori Kobayashi

In this work, to promote the word-level representation learning in end-to-end ASR, we propose a hierarchical conditional model that is based on connectionist temporal classification (CTC).

automatic-speech-recognition End-To-End Speech Recognition +2

Non-autoregressive End-to-end Speech Translation with Parallel Autoregressive Rescoring

no code implementations9 Sep 2021 Hirofumi Inaguma, Yosuke Higuchi, Kevin Duh, Tatsuya Kawahara, Shinji Watanabe

We propose a unified NAR E2E-ST framework called Orthros, which has an NAR decoder and an auxiliary shallow AR decoder on top of the shared encoder.

Language Modelling Translation

Momentum Pseudo-Labeling for Semi-Supervised Speech Recognition

no code implementations16 Jun 2021 Yosuke Higuchi, Niko Moritz, Jonathan Le Roux, Takaaki Hori

MPL consists of a pair of online and offline models that interact and learn from each other, inspired by the mean teacher method.

automatic-speech-recognition End-To-End Speech Recognition +1

Improved Mask-CTC for Non-Autoregressive End-to-End ASR

no code implementations26 Oct 2020 Yosuke Higuchi, Hirofumi Inaguma, Shinji Watanabe, Tetsuji Ogawa, Tetsunori Kobayashi

While Mask-CTC achieves remarkably fast inference speed, its recognition performance falls behind that of conventional autoregressive (AR) systems.

automatic-speech-recognition End-To-End Speech Recognition +2

Orthros: Non-autoregressive End-to-end Speech Translation with Dual-decoder

no code implementations25 Oct 2020 Hirofumi Inaguma, Yosuke Higuchi, Kevin Duh, Tatsuya Kawahara, Shinji Watanabe

Fast inference speed is an important goal towards real-world deployment of speech translation (ST) systems.

Translation

Mask CTC: Non-Autoregressive End-to-End ASR with CTC and Mask Predict

no code implementations18 May 2020 Yosuke Higuchi, Shinji Watanabe, Nanxin Chen, Tetsuji Ogawa, Tetsunori Kobayashi

In this work, Mask CTC model is trained using a Transformer encoder-decoder with joint training of mask prediction and CTC.

Audio and Speech Processing Sound

Cannot find the paper you are looking for? You can Submit a new open access paper.