Search Results for author: Yusuke Fujita

Found 26 papers, 8 papers with code

Audio Difference Learning for Audio Captioning

no code implementations15 Sep 2023 Tatsuya Komatsu, Yusuke Fujita, Kazuya Takeda, Tomoki Toda

Furthermore, a unique technique is proposed that involves mixing the input audio with additional audio, and using the additional audio as a reference.

Audio captioning

Neural Diarization with Non-autoregressive Intermediate Attractors

1 code implementation13 Mar 2023 Yusuke Fujita, Tatsuya Komatsu, Robin Scheibler, Yusuke Kida, Tetsuji Ogawa

The experiments with the two-speaker CALLHOME dataset show that the intermediate labels with the proposed non-autoregressive intermediate attractors boost the diarization performance.

speaker-diarization Speaker Diarization

InterAug: Augmenting Noisy Intermediate Predictions for CTC-based ASR

no code implementations1 Apr 2022 Yu Nakagome, Tatsuya Komatsu, Yusuke Fujita, Shuta Ichimura, Yusuke Kida

The proposed method exploits the conditioning framework of self-conditioned CTC to train robust models by conditioning with "noisy" intermediate predictions.

speech-recognition Speech Recognition

Better Intermediates Improve CTC Inference

no code implementations1 Apr 2022 Tatsuya Komatsu, Yusuke Fujita, Jaesong Lee, Lukas Lee, Shinji Watanabe, Yusuke Kida

This paper proposes a method for improved CTC inference with searched intermediates and multi-pass conditioning.

Online Streaming End-to-End Neural Diarization Handling Overlapping Speech and Flexible Numbers of Speakers

no code implementations21 Jan 2021 Yawen Xue, Shota Horiguchi, Yusuke Fujita, Yuki Takashima, Shinji Watanabe, Paola Garcia, Kenji Nagamatsu

We propose a streaming diarization method based on an end-to-end neural diarization (EEND) model, which handles flexible numbers of speakers and overlapping speech.

Speaker Diarization Sound Audio and Speech Processing

End-to-End Speaker Diarization as Post-Processing

no code implementations18 Dec 2020 Shota Horiguchi, Paola Garcia, Yusuke Fujita, Shinji Watanabe, Kenji Nagamatsu

Clustering-based diarization methods partition frames into clusters of the number of speakers; thus, they typically cannot handle overlapping speech because each frame is assigned to one speaker.

Clustering Multi-Label Classification +2

Block-Online Guided Source Separation

no code implementations16 Nov 2020 Shota Horiguchi, Yusuke Fujita, Kenji Nagamatsu

It is also a problem that the offline GSS is an utterance-wise algorithm so that it produces latency according to the length of the utterance.

Speech Separation

Utterance-Wise Meeting Transcription System Using Asynchronous Distributed Microphones

no code implementations31 Jul 2020 Shota Horiguchi, Yusuke Fujita, Kenji Nagamatsu

We also showed that our framework achieved CER of 21. 8 %, which is only 2. 1 percentage points higher than the CER in headset microphone-based transcription.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Speaker-Conditional Chain Model for Speech Separation and Extraction

no code implementations25 Jun 2020 Jing Shi, Jiaming Xu, Yusuke Fujita, Shinji Watanabe, Bo Xu

With the predicted speaker information from whole observation, our model is helpful to solve the problem of conventional speech separation and speaker extraction for multi-round long recordings.

Audio and Speech Processing Sound

Online End-to-End Neural Diarization with Speaker-Tracing Buffer

no code implementations4 Jun 2020 Yawen Xue, Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Kenji Nagamatsu

This paper proposes a novel online speaker diarization algorithm based on a fully supervised self-attention mechanism (SA-EEND).

speaker-diarization Speaker Diarization

End-to-End Neural Diarization: Reformulating Speaker Diarization as Simple Multi-label Classification

1 code implementation24 Feb 2020 Yusuke Fujita, Shinji Watanabe, Shota Horiguchi, Yawen Xue, Kenji Nagamatsu

However, the clustering-based approach has a number of problems; i. e., (i) it is not optimized to minimize diarization errors directly, (ii) it cannot handle speaker overlaps correctly, and (iii) it has trouble adapting their speaker embedding models to real audio recordings with speaker overlaps.

Clustering General Classification +3

Speaker Diarization with Region Proposal Network

1 code implementation14 Feb 2020 Zili Huang, Shinji Watanabe, Yusuke Fujita, Paola Garcia, Yiwen Shao, Daniel Povey, Sanjeev Khudanpur

Speaker diarization is an important pre-processing step for many speech applications, and it aims to solve the "who spoke when" problem.

Region Proposal speaker-diarization +1

End-to-End Neural Speaker Diarization with Permutation-Free Objectives

1 code implementation12 Sep 2019 Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Kenji Nagamatsu, Shinji Watanabe

To realize such a model, we formulate the speaker diarization problem as a multi-label classification problem, and introduces a permutation-free objective function to directly minimize diarization errors without being suffered from the speaker-label permutation problem.

Clustering Domain Adaptation +3

Cannot find the paper you are looking for? You can Submit a new open access paper.