Search Results for author: Tetsuji Ogawa

Found 17 papers, 3 papers with code

Neural Diarization with Non-autoregressive Intermediate Attractors

1 code implementation • 13 Mar 2023 • Yusuke Fujita, Tatsuya Komatsu, Robin Scheibler, Yusuke Kida, Tetsuji Ogawa

The experiments with the two-speaker CALLHOME dataset show that the intermediate labels with the proposed non-autoregressive intermediate attractors boost the diarization performance.

speaker-diarization Speaker Diarization

347

Paper
Code

Hierarchical Conditional End-to-End ASR with CTC and Multi-Granular Subword Units

1 code implementation • 8 Oct 2021 • Yosuke Higuchi, Keita Karube, Tetsuji Ogawa, Tetsunori Kobayashi

In this work, to promote the word-level representation learning in end-to-end ASR, we propose a hierarchical conditional model that is based on connectionist temporal classification (CTC).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

InterMPL: Momentum Pseudo-Labeling with Intermediate CTC Loss

1 code implementation • 2 Nov 2022 • Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe

This paper presents InterMPL, a semi-supervised learning method of end-to-end automatic speech recognition (ASR) that performs pseudo-labeling (PL) with intermediate supervision.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Code

Block-wise Scrambled Image Recognition Using Adaptation Network

no code implementations • 21 Jan 2020 • Koki Madono, Masayuki Tanaka, Masaki Onishi, Tetsuji Ogawa

In this study, a perceptually hidden object-recognition method is investigated to generate secure images recognizable by humans but not machines.

Image Classification Object +1

Paper
Add Code

Mask CTC: Non-Autoregressive End-to-End ASR with CTC and Mask Predict

no code implementations • 18 May 2020 • Yosuke Higuchi, Shinji Watanabe, Nanxin Chen, Tetsuji Ogawa, Tetsunori Kobayashi

In this work, Mask CTC model is trained using a Transformer encoder-decoder with joint training of mask prediction and CTC.

Audio and Speech Processing Sound

Paper
Add Code

Improved Mask-CTC for Non-Autoregressive End-to-End ASR

no code implementations • 26 Oct 2020 • Yosuke Higuchi, Hirofumi Inaguma, Shinji Watanabe, Tetsuji Ogawa, Tetsunori Kobayashi

While Mask-CTC achieves remarkably fast inference speed, its recognition performance falls behind that of conventional autoregressive (AR) systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Exploiting Narrative Context and A Priori Knowledge of Categories in Textual Emotion Classification

no code implementations • COLING 2020 • Hikari Tanabe, Tetsuji Ogawa, Tetsunori Kobayashi, Yoshihiko Hayashi

Recognition of the mental state of a human character in text is a major challenge in natural language processing.

Emotion Classification Multi-Task Learning +1

Paper
Add Code

An Investigation of Enhancing CTC Model for Triggered Attention-based Streaming ASR

no code implementations • 20 Oct 2021 • Huaibo Zhao, Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi

In the present paper, an attempt is made to combine Mask-CTC and the triggered attention mechanism to construct a streaming end-to-end automatic speech recognition (ASR) system that provides high performance with low latency.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Remix-cycle-consistent Learning on Adversarially Learned Separator for Accurate and Stable Unsupervised Speech Separation

no code implementations • 26 Mar 2022 • Kohei Saijo, Tetsuji Ogawa

A new learning algorithm for speech separation networks is designed to explicitly reduce residual noise and artifacts in the separated signal in an unsupervised manner.

Speech Separation

Paper
Add Code

BERT Meets CTC: New Formulation of End-to-End Speech Recognition with Pre-trained Masked Language Model

no code implementations • 29 Oct 2022 • Yosuke Higuchi, Brian Yan, Siddhant Arora, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe

This paper presents BERT-CTC, a novel formulation of end-to-end speech recognition that adapts BERT for connectionist temporal classification (CTC).

Language Modelling speech-recognition +2

Paper
Add Code

BECTRA: Transducer-based End-to-End ASR with BERT-Enhanced Encoder

no code implementations • 2 Nov 2022 • Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe

One crucial factor that makes this integration challenging lies in the vocabulary mismatch; the vocabulary constructed for a pre-trained LM is generally too large for E2E-ASR training and is likely to have a mismatch against a target ASR domain.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Self-Remixing: Unsupervised Speech Separation via Separation and Remixing

no code implementations • 18 Nov 2022 • Kohei Saijo, Tetsuji Ogawa

Specifically, the shuffler first separates observed mixtures and makes pseudo-mixtures by shuffling and remixing the separated signals.

Domain Adaptation Semi-supervised Domain Adaptation +1

Paper
Add Code

Video Surveillance System Incorporating Expert Decision-making Process: A Case Study on Detecting Calving Signs in Cattle

no code implementations • 10 Jan 2023 • Ryosuke Hyodo, Susumu Saito, Teppei Nakano, Makoto Akabane, Ryoichi Kasuga, Tetsuji Ogawa

In this study, we examine the framework of a video surveillance AI system that presents the reasoning behind predictions by incorporating experts' decision-making processes with rich domain knowledge of the notification target.

Decision Making Explainable Artificial Intelligence (XAI)

Paper
Add Code

Deep Multi-stream Network for Video-based Calving Sign Detection

no code implementations • 10 Jan 2023 • Ryosuke Hyodo, Teppei Nakano, Tetsuji Ogawa

We have designed a deep multi-stream network for automatically detecting calving signs from video.

Attribute Management

Paper
Add Code

Remixing-based Unsupervised Source Separation from Scratch

no code implementations • 1 Sep 2023 • Kohei Saijo, Tetsuji Ogawa

A student model is then trained to separate the pseudo-mixtures using either the teacher's outputs or the initial mixtures as supervision.

Self-Supervised Learning

Paper
Add Code

Harnessing the Zero-Shot Power of Instruction-Tuned Large Language Model in End-to-End Speech Recognition

no code implementations • 19 Sep 2023 • Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi

We present a novel integration of an instruction-tuned large language model (LLM) and end-to-end automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Paper
Add Code

A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, and Extraction

no code implementations • 12 Oct 2023 • Kohei Saijo, Wangyou Zhang, Zhong-Qiu Wang, Shinji Watanabe, Tetsunori Kobayashi, Tetsuji Ogawa

We propose a multi-task universal speech enhancement (MUSE) model that can perform five speech enhancement (SE) tasks: dereverberation, denoising, speech separation (SS), target speaker extraction (TSE), and speaker counting.

Denoising Speech Enhancement +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.