no code implementations • 10 Nov 2022 • Yifan Peng, Siddhant Arora, Yosuke Higuchi, Yushi Ueda, Sujay Kumar, Karthik Ganesan, Siddharth Dalmia, Xuankai Chang, Shinji Watanabe
Collecting sufficient labeled data for spoken language understanding (SLU) is expensive and time-consuming.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+6
no code implementations • 2 Nov 2022 • Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe
One crucial factor that makes this integration challenging lies in the vocabulary mismatch; the vocabulary constructed for a pre-trained LM is generally too large for E2E-ASR training and is likely to have a mismatch against a target ASR domain.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
1 code implementation • 2 Nov 2022 • Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe
This paper presents InterMPL, a semi-supervised learning method of end-to-end automatic speech recognition (ASR) that performs pseudo-labeling (PL) with intermediate supervision.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 29 Oct 2022 • Yosuke Higuchi, Brian Yan, Siddhant Arora, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe
This paper presents BERT-CTC, a novel formulation of end-to-end speech recognition that adapts BERT for connectionist temporal classification (CTC).
no code implementations • 11 Oct 2022 • Brian Yan, Siddharth Dalmia, Yosuke Higuchi, Graham Neubig, Florian Metze, Alan W Black, Shinji Watanabe
Connectionist Temporal Classification (CTC) is a widely used approach for automatic speech recognition (ASR) that performs conditionally independent monotonic alignment.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
1 code implementation • 20 Sep 2022 • Masao Someki, Yosuke Higuchi, Tomoki Hayashi, Shinji Watanabe
In the field of deep learning, researchers often focus on inventing novel neural network models and improving benchmarks.
no code implementations • 25 Jan 2022 • Keqi Deng, Zehui Yang, Shinji Watanabe, Yosuke Higuchi, Gaofeng Cheng, Pengyuan Zhang
The proposed NAR model significantly surpasses previous NAR systems on the AISHELL-1 benchmark and shows a potential for English tasks.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 20 Oct 2021 • Huaibo Zhao, Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi
In the present paper, an attempt is made to combine Mask-CTC and the triggered attention mechanism to construct a streaming end-to-end automatic speech recognition (ASR) system that provides high performance with low latency.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 11 Oct 2021 • Yosuke Higuchi, Niko Moritz, Jonathan Le Roux, Takaaki Hori
Pseudo-labeling (PL), a semi-supervised learning (SSL) method where a seed model performs self-training using pseudo-labels generated from untranscribed speech, has been shown to enhance the performance of end-to-end automatic speech recognition (ASR).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 11 Oct 2021 • Yosuke Higuchi, Nanxin Chen, Yuya Fujita, Hirofumi Inaguma, Tatsuya Komatsu, Jaesong Lee, Jumon Nozaki, Tianzi Wang, Shinji Watanabe
Non-autoregressive (NAR) models simultaneously generate multiple outputs in a sequence, which significantly reduces the inference speed at the cost of accuracy drop compared to autoregressive baselines.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
1 code implementation • 8 Oct 2021 • Yosuke Higuchi, Keita Karube, Tetsuji Ogawa, Tetsunori Kobayashi
In this work, to promote the word-level representation learning in end-to-end ASR, we propose a hierarchical conditional model that is based on connectionist temporal classification (CTC).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 9 Sep 2021 • Hirofumi Inaguma, Yosuke Higuchi, Kevin Duh, Tatsuya Kawahara, Shinji Watanabe
We propose a unified NAR E2E-ST framework called Orthros, which has an NAR decoder and an auxiliary shallow AR decoder on top of the shared encoder.
no code implementations • 16 Jun 2021 • Yosuke Higuchi, Niko Moritz, Jonathan Le Roux, Takaaki Hori
MPL consists of a pair of online and offline models that interact and learn from each other, inspired by the mean teacher method.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 26 Oct 2020 • Yosuke Higuchi, Hirofumi Inaguma, Shinji Watanabe, Tetsuji Ogawa, Tetsunori Kobayashi
While Mask-CTC achieves remarkably fast inference speed, its recognition performance falls behind that of conventional autoregressive (AR) systems.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 25 Oct 2020 • Hirofumi Inaguma, Yosuke Higuchi, Kevin Duh, Tatsuya Kawahara, Shinji Watanabe
Fast inference speed is an important goal towards real-world deployment of speech translation (ST) systems.
no code implementations • 18 May 2020 • Yosuke Higuchi, Shinji Watanabe, Nanxin Chen, Tetsuji Ogawa, Tetsunori Kobayashi
In this work, Mask CTC model is trained using a Transformer encoder-decoder with joint training of mask prediction and CTC.
Audio and Speech Processing Sound