Search Results for author: Yosuke Higuchi

Found 18 papers, 3 papers with code

Segment-Level Vectorized Beam Search Based on Partially Autoregressive Inference

no code implementations26 Sep 2023 Masao Someki, Nicholas Eng, Yosuke Higuchi, Shinji Watanabe

Attention-based encoder-decoder models with autoregressive (AR) decoding have proven to be the dominant approach for automatic speech recognition (ASR) due to their superior accuracy.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

InterMPL: Momentum Pseudo-Labeling with Intermediate CTC Loss

1 code implementation2 Nov 2022 Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe

This paper presents InterMPL, a semi-supervised learning method of end-to-end automatic speech recognition (ASR) that performs pseudo-labeling (PL) with intermediate supervision.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

BECTRA: Transducer-based End-to-End ASR with BERT-Enhanced Encoder

no code implementations2 Nov 2022 Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe

One crucial factor that makes this integration challenging lies in the vocabulary mismatch; the vocabulary constructed for a pre-trained LM is generally too large for E2E-ASR training and is likely to have a mismatch against a target ASR domain.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

CTC Alignments Improve Autoregressive Translation

no code implementations11 Oct 2022 Brian Yan, Siddharth Dalmia, Yosuke Higuchi, Graham Neubig, Florian Metze, Alan W Black, Shinji Watanabe

Connectionist Temporal Classification (CTC) is a widely used approach for automatic speech recognition (ASR) that performs conditionally independent monotonic alignment.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

ESPnet-ONNX: Bridging a Gap Between Research and Production

1 code implementation20 Sep 2022 Masao Someki, Yosuke Higuchi, Tomoki Hayashi, Shinji Watanabe

In the field of deep learning, researchers often focus on inventing novel neural network models and improving benchmarks.

Spoken Language Understanding

An Investigation of Enhancing CTC Model for Triggered Attention-based Streaming ASR

no code implementations20 Oct 2021 Huaibo Zhao, Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi

In the present paper, an attempt is made to combine Mask-CTC and the triggered attention mechanism to construct a streaming end-to-end automatic speech recognition (ASR) system that provides high performance with low latency.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation

no code implementations11 Oct 2021 Yosuke Higuchi, Nanxin Chen, Yuya Fujita, Hirofumi Inaguma, Tatsuya Komatsu, Jaesong Lee, Jumon Nozaki, Tianzi Wang, Shinji Watanabe

Non-autoregressive (NAR) models simultaneously generate multiple outputs in a sequence, which significantly reduces the inference speed at the cost of accuracy drop compared to autoregressive baselines.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Advancing Momentum Pseudo-Labeling with Conformer and Initialization Strategy

no code implementations11 Oct 2021 Yosuke Higuchi, Niko Moritz, Jonathan Le Roux, Takaaki Hori

Pseudo-labeling (PL), a semi-supervised learning (SSL) method where a seed model performs self-training using pseudo-labels generated from untranscribed speech, has been shown to enhance the performance of end-to-end automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Hierarchical Conditional End-to-End ASR with CTC and Multi-Granular Subword Units

1 code implementation8 Oct 2021 Yosuke Higuchi, Keita Karube, Tetsuji Ogawa, Tetsunori Kobayashi

In this work, to promote the word-level representation learning in end-to-end ASR, we propose a hierarchical conditional model that is based on connectionist temporal classification (CTC).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Non-autoregressive End-to-end Speech Translation with Parallel Autoregressive Rescoring

no code implementations9 Sep 2021 Hirofumi Inaguma, Yosuke Higuchi, Kevin Duh, Tatsuya Kawahara, Shinji Watanabe

We propose a unified NAR E2E-ST framework called Orthros, which has an NAR decoder and an auxiliary shallow AR decoder on top of the shared encoder.

Language Modelling Translation

Momentum Pseudo-Labeling for Semi-Supervised Speech Recognition

no code implementations16 Jun 2021 Yosuke Higuchi, Niko Moritz, Jonathan Le Roux, Takaaki Hori

MPL consists of a pair of online and offline models that interact and learn from each other, inspired by the mean teacher method.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Improved Mask-CTC for Non-Autoregressive End-to-End ASR

no code implementations26 Oct 2020 Yosuke Higuchi, Hirofumi Inaguma, Shinji Watanabe, Tetsuji Ogawa, Tetsunori Kobayashi

While Mask-CTC achieves remarkably fast inference speed, its recognition performance falls behind that of conventional autoregressive (AR) systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Orthros: Non-autoregressive End-to-end Speech Translation with Dual-decoder

no code implementations25 Oct 2020 Hirofumi Inaguma, Yosuke Higuchi, Kevin Duh, Tatsuya Kawahara, Shinji Watanabe

Fast inference speed is an important goal towards real-world deployment of speech translation (ST) systems.

Translation

Mask CTC: Non-Autoregressive End-to-End ASR with CTC and Mask Predict

no code implementations18 May 2020 Yosuke Higuchi, Shinji Watanabe, Nanxin Chen, Tetsuji Ogawa, Tetsunori Kobayashi

In this work, Mask CTC model is trained using a Transformer encoder-decoder with joint training of mask prediction and CTC.

Audio and Speech Processing Sound

Cannot find the paper you are looking for? You can Submit a new open access paper.