1 code implementation • 13 Mar 2023 • Yusuke Fujita, Tatsuya Komatsu, Robin Scheibler, Yusuke Kida, Tetsuji Ogawa
The experiments with the two-speaker CALLHOME dataset show that the intermediate labels with the proposed non-autoregressive intermediate attractors boost the diarization performance.
1 code implementation • 8 Oct 2021 • Yosuke Higuchi, Keita Karube, Tetsuji Ogawa, Tetsunori Kobayashi
In this work, to promote the word-level representation learning in end-to-end ASR, we propose a hierarchical conditional model that is based on connectionist temporal classification (CTC).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 2 Nov 2022 • Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe
This paper presents InterMPL, a semi-supervised learning method of end-to-end automatic speech recognition (ASR) that performs pseudo-labeling (PL) with intermediate supervision.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 21 Jan 2020 • Koki Madono, Masayuki Tanaka, Masaki Onishi, Tetsuji Ogawa
In this study, a perceptually hidden object-recognition method is investigated to generate secure images recognizable by humans but not machines.
no code implementations • 18 May 2020 • Yosuke Higuchi, Shinji Watanabe, Nanxin Chen, Tetsuji Ogawa, Tetsunori Kobayashi
In this work, Mask CTC model is trained using a Transformer encoder-decoder with joint training of mask prediction and CTC.
Audio and Speech Processing Sound
no code implementations • 26 Oct 2020 • Yosuke Higuchi, Hirofumi Inaguma, Shinji Watanabe, Tetsuji Ogawa, Tetsunori Kobayashi
While Mask-CTC achieves remarkably fast inference speed, its recognition performance falls behind that of conventional autoregressive (AR) systems.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • COLING 2020 • Hikari Tanabe, Tetsuji Ogawa, Tetsunori Kobayashi, Yoshihiko Hayashi
Recognition of the mental state of a human character in text is a major challenge in natural language processing.
no code implementations • 20 Oct 2021 • Huaibo Zhao, Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi
In the present paper, an attempt is made to combine Mask-CTC and the triggered attention mechanism to construct a streaming end-to-end automatic speech recognition (ASR) system that provides high performance with low latency.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 26 Mar 2022 • Kohei Saijo, Tetsuji Ogawa
A new learning algorithm for speech separation networks is designed to explicitly reduce residual noise and artifacts in the separated signal in an unsupervised manner.
no code implementations • 29 Oct 2022 • Yosuke Higuchi, Brian Yan, Siddhant Arora, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe
This paper presents BERT-CTC, a novel formulation of end-to-end speech recognition that adapts BERT for connectionist temporal classification (CTC).
no code implementations • 2 Nov 2022 • Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe
One crucial factor that makes this integration challenging lies in the vocabulary mismatch; the vocabulary constructed for a pre-trained LM is generally too large for E2E-ASR training and is likely to have a mismatch against a target ASR domain.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 18 Nov 2022 • Kohei Saijo, Tetsuji Ogawa
Specifically, the shuffler first separates observed mixtures and makes pseudo-mixtures by shuffling and remixing the separated signals.
no code implementations • 10 Jan 2023 • Ryosuke Hyodo, Susumu Saito, Teppei Nakano, Makoto Akabane, Ryoichi Kasuga, Tetsuji Ogawa
In this study, we examine the framework of a video surveillance AI system that presents the reasoning behind predictions by incorporating experts' decision-making processes with rich domain knowledge of the notification target.
no code implementations • 10 Jan 2023 • Ryosuke Hyodo, Teppei Nakano, Tetsuji Ogawa
We have designed a deep multi-stream network for automatically detecting calving signs from video.
no code implementations • 1 Sep 2023 • Kohei Saijo, Tetsuji Ogawa
A student model is then trained to separate the pseudo-mixtures using either the teacher's outputs or the initial mixtures as supervision.
no code implementations • 19 Sep 2023 • Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi
We present a novel integration of an instruction-tuned large language model (LLM) and end-to-end automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +5
no code implementations • 12 Oct 2023 • Kohei Saijo, Wangyou Zhang, Zhong-Qiu Wang, Shinji Watanabe, Tetsunori Kobayashi, Tetsuji Ogawa
We propose a multi-task universal speech enhancement (MUSE) model that can perform five speech enhancement (SE) tasks: dereverberation, denoising, speech separation (SS), target speaker extraction (TSE), and speaker counting.