Search Results for author: Yusuke Fujita

Found 27 papers, 8 papers with code

Keep Decoding Parallel with Effective Knowledge Distillation from Language Models to End-to-end Speech Recognisers

no code implementations • 22 Jan 2024 • Michael Hentschel, Yuta Nishikawa, Tatsuya Komatsu, Yusuke Fujita

This study presents a novel approach for knowledge distillation (KD) from a BERT teacher model to an automatic speech recognition (ASR) model using intermediate layers.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Audio Difference Learning for Audio Captioning

no code implementations • 15 Sep 2023 • Tatsuya Komatsu, Yusuke Fujita, Kazuya Takeda, Tomoki Toda

Furthermore, a unique technique is proposed that involves mixing the input audio with additional audio, and using the additional audio as a reference.

Audio captioning

Paper
Add Code

Neural Diarization with Non-autoregressive Intermediate Attractors

1 code implementation • 13 Mar 2023 • Yusuke Fujita, Tatsuya Komatsu, Robin Scheibler, Yusuke Kida, Tetsuji Ogawa

The experiments with the two-speaker CALLHOME dataset show that the intermediate labels with the proposed non-autoregressive intermediate attractors boost the diarization performance.

speaker-diarization Speaker Diarization

347

Paper
Code

InterAug: Augmenting Noisy Intermediate Predictions for CTC-based ASR

no code implementations • 1 Apr 2022 • Yu Nakagome, Tatsuya Komatsu, Yusuke Fujita, Shuta Ichimura, Yusuke Kida

The proposed method exploits the conditioning framework of self-conditioned CTC to train robust models by conditioning with "noisy" intermediate predictions.

speech-recognition Speech Recognition

Paper
Add Code

Alternate Intermediate Conditioning with Syllable-level and Character-level Targets for Japanese ASR

no code implementations • 1 Apr 2022 • Yusuke Fujita, Tatsuya Komatsu, Yusuke Kida

End-to-end automatic speech recognition directly maps input speech to characters.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Better Intermediates Improve CTC Inference

no code implementations • 1 Apr 2022 • Tatsuya Komatsu, Yusuke Fujita, Jaesong Lee, Lukas Lee, Shinji Watanabe, Yusuke Kida

This paper proposes a method for improved CTC inference with searched intermediates and multi-pass conditioning.

Paper
Add Code

Encoder-Decoder Based Attractors for End-to-End Neural Diarization

no code implementations • 20 Jun 2021 • Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Yawen Xue, Paola Garcia

Diarization results are then estimated as dot products of the attractors and embeddings.

speaker-diarization Speaker Diarization

Paper
Add Code

Semi-Supervised Training with Pseudo-Labeling for End-to-End Neural Diarization

no code implementations • 9 Jun 2021 • Yuki Takashima, Yusuke Fujita, Shota Horiguchi, Shinji Watanabe, Paola García, Kenji Nagamatsu

To evaluate our proposed method, we conduct the experiments of model adaptation using labeled and unlabeled data.

Clustering Pseudo Label

Paper
Add Code

End-to-End Speaker Diarization Conditioned on Speech Activity and Overlap Detection

no code implementations • 8 Jun 2021 • Yuki Takashima, Yusuke Fujita, Shinji Watanabe, Shota Horiguchi, Paola García, Kenji Nagamatsu

In this paper, we present a conditional multitask learning method for end-to-end neural speaker diarization (EEND).

Clustering speaker-diarization +1

Paper
Add Code

The Hitachi-JHU DIHARD III System: Competitive End-to-End Neural Diarization and X-Vector Clustering Systems Combined by DOVER-Lap

no code implementations • 2 Feb 2021 • Shota Horiguchi, Nelson Yalta, Paola Garcia, Yuki Takashima, Yawen Xue, Desh Raj, Zili Huang, Yusuke Fujita, Shinji Watanabe, Sanjeev Khudanpur

This paper provides a detailed description of the Hitachi-JHU system that was submitted to the Third DIHARD Speech Diarization Challenge.

Clustering

Paper
Add Code

Online Streaming End-to-End Neural Diarization Handling Overlapping Speech and Flexible Numbers of Speakers

no code implementations • 21 Jan 2021 • Yawen Xue, Shota Horiguchi, Yusuke Fujita, Yuki Takashima, Shinji Watanabe, Paola Garcia, Kenji Nagamatsu

We propose a streaming diarization method based on an end-to-end neural diarization (EEND) model, which handles flexible numbers of speakers and overlapping speech.

Speaker Diarization Sound Audio and Speech Processing

Paper
Add Code

End-to-End Speaker Diarization as Post-Processing

no code implementations • 18 Dec 2020 • Shota Horiguchi, Paola Garcia, Yusuke Fujita, Shinji Watanabe, Kenji Nagamatsu

Clustering-based diarization methods partition frames into clusters of the number of speakers; thus, they typically cannot handle overlapping speech because each frame is assigned to one speaker.

Clustering Multi-Label Classification +2

Paper
Add Code

Block-Online Guided Source Separation

no code implementations • 16 Nov 2020 • Shota Horiguchi, Yusuke Fujita, Kenji Nagamatsu

It is also a problem that the offline GSS is an utterance-wise algorithm so that it produces latency according to the length of the utterance.

Speech Separation

Paper
Add Code

Utterance-Wise Meeting Transcription System Using Asynchronous Distributed Microphones

no code implementations • 31 Jul 2020 • Shota Horiguchi, Yusuke Fujita, Kenji Nagamatsu

We also showed that our framework achieved CER of 21. 8 %, which is only 2. 1 percentage points higher than the CER in headset microphone-based transcription.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals

no code implementations • NeurIPS 2020 • Jing Shi, Xuankai Chang, Pengcheng Guo, Shinji Watanabe, Yusuke Fujita, Jiaming Xu, Bo Xu, Lei Xie

This model additionally has a simple and efficient stop criterion for the end of the transduction, making it able to infer the variable number of output sequences.

Ranked #3 on Speech Separation on WSJ0-4mix

speech-recognition Speech Recognition +1

Paper
Add Code

Speaker-Conditional Chain Model for Speech Separation and Extraction

no code implementations • 25 Jun 2020 • Jing Shi, Jiaming Xu, Yusuke Fujita, Shinji Watanabe, Bo Xu

With the predicted speaker information from whole observation, our model is helpful to solve the problem of conventional speech separation and speaker extraction for multi-round long recordings.

Audio and Speech Processing Sound

Paper
Add Code

Online End-to-End Neural Diarization with Speaker-Tracing Buffer

no code implementations • 4 Jun 2020 • Yawen Xue, Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Kenji Nagamatsu

This paper proposes a novel online speaker diarization algorithm based on a fully supervised self-attention mechanism (SA-EEND).

speaker-diarization Speaker Diarization

Paper
Add Code

Neural Speaker Diarization with Speaker-Wise Chain Rule

1 code implementation • 2 Jun 2020 • Yusuke Fujita, Shinji Watanabe, Shota Horiguchi, Yawen Xue, Jing Shi, Kenji Nagamatsu

Speaker diarization is an essential step for processing multi-speaker audio.

speaker-diarization Speaker Diarization

347

Paper
Code

End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors

3 code implementations • 20 May 2020 • Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Yawen Xue, Kenji Nagamatsu

End-to-end speaker diarization for an unknown number of speakers is addressed in this paper.

Clustering speaker-diarization +1

347

Paper
Code

CHiME-6 Challenge:Tackling Multispeaker Speech Recognition for Unsegmented Recordings

no code implementations • 20 Apr 2020 • Shinji Watanabe, Michael Mandel, Jon Barker, Emmanuel Vincent, Ashish Arora, Xuankai Chang, Sanjeev Khudanpur, Vimal Manohar, Daniel Povey, Desh Raj, David Snyder, Aswin Shanmugam Subramanian, Jan Trmal, Bar Ben Yair, Christoph Boeddeker, Zhaoheng Ni, Yusuke Fujita, Shota Horiguchi, Naoyuki Kanda, Takuya Yoshioka, Neville Ryant

Following the success of the 1st, 2nd, 3rd, 4th and 5th CHiME challenges we organize the 6th CHiME Speech Separation and Recognition Challenge (CHiME-6).

speaker-diarization Speaker Diarization +4

Paper
Add Code

End-to-End Neural Diarization: Reformulating Speaker Diarization as Simple Multi-label Classification

1 code implementation • 24 Feb 2020 • Yusuke Fujita, Shinji Watanabe, Shota Horiguchi, Yawen Xue, Kenji Nagamatsu

However, the clustering-based approach has a number of problems; i. e., (i) it is not optimized to minimize diarization errors directly, (ii) it cannot handle speaker overlaps correctly, and (iii) it has trouble adapting their speaker embedding models to real audio recordings with speaker overlaps.

Clustering General Classification +3

Paper
Code

Speaker Diarization with Region Proposal Network

1 code implementation • 14 Feb 2020 • Zili Huang, Shinji Watanabe, Yusuke Fujita, Paola Garcia, Yiwen Shao, Daniel Povey, Sanjeev Khudanpur

Speaker diarization is an important pre-processing step for many speech applications, and it aims to solve the "who spoke when" problem.

Region Proposal speaker-diarization +1

Paper
Code

Simultaneous Speech Recognition and Speaker Diarization for Monaural Dialogue Recordings with Target-Speaker Acoustic Models

no code implementations • 17 Sep 2019 • Naoyuki Kanda, Shota Horiguchi, Yusuke Fujita, Yawen Xue, Kenji Nagamatsu, Shinji Watanabe

Our proposed method combined with i-vector speaker embeddings ultimately achieved a WER that differed by only 2. 1 % from that of TS-ASR given oracle speaker embeddings.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

End-to-End Neural Speaker Diarization with Self-attention

2 code implementations • 13 Sep 2019 • Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Yawen Xue, Kenji Nagamatsu, Shinji Watanabe

Our method was even better than that of the state-of-the-art x-vector clustering-based method.

Ranked #2 on Speaker Diarization on CALLHOME

Clustering speaker-diarization +1

347

Paper
Code

End-to-End Neural Speaker Diarization with Permutation-Free Objectives

1 code implementation • 12 Sep 2019 • Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Kenji Nagamatsu, Shinji Watanabe

To realize such a model, we formulate the speaker diarization problem as a multi-label classification problem, and introduces a permutation-free objective function to directly minimize diarization errors without being suffered from the speaker-label permutation problem.

Ranked #6 on Speaker Diarization on CALLHOME

Clustering Domain Adaptation +3