no code implementations • 6 Dec 2021 • Atsuki Yamaguchi, Gaku Morio, Hiroaki Ozaki, Ken-ichi Yokote, Kenji Nagamatsu
This paper introduces the proposed automatic minuting system of the Hitachi team for the First Shared Task on Automatic Minuting (AutoMin-2021).
no code implementations • 9 Jun 2021 • Yuki Takashima, Yusuke Fujita, Shota Horiguchi, Shinji Watanabe, Paola García, Kenji Nagamatsu
To evaluate our proposed method, we conduct the experiments of model adaptation using labeled and unlabeled data.
no code implementations • 8 Jun 2021 • Yuki Takashima, Yusuke Fujita, Shinji Watanabe, Shota Horiguchi, Paola García, Kenji Nagamatsu
In this paper, we present a conditional multitask learning method for end-to-end neural speaker diarization (EEND).
no code implementations • 21 Jan 2021 • Yawen Xue, Shota Horiguchi, Yusuke Fujita, Yuki Takashima, Shinji Watanabe, Paola Garcia, Kenji Nagamatsu
We propose a streaming diarization method based on an end-to-end neural diarization (EEND) model, which handles flexible numbers of speakers and overlapping speech.
Speaker Diarization Sound Audio and Speech Processing
no code implementations • 18 Dec 2020 • Shota Horiguchi, Paola Garcia, Yusuke Fujita, Shinji Watanabe, Kenji Nagamatsu
Clustering-based diarization methods partition frames into clusters of the number of speakers; thus, they typically cannot handle overlapping speech because each frame is assigned to one speaker.
no code implementations • 16 Nov 2020 • Shota Horiguchi, Yusuke Fujita, Kenji Nagamatsu
It is also a problem that the offline GSS is an utterance-wise algorithm so that it produces latency according to the length of the utterance.
no code implementations • 31 Jul 2020 • Shota Horiguchi, Yusuke Fujita, Kenji Nagamatsu
We also showed that our framework achieved CER of 21. 8 %, which is only 2. 1 percentage points higher than the CER in headset microphone-based transcription.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 4 Jun 2020 • Yawen Xue, Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Kenji Nagamatsu
This paper proposes a novel online speaker diarization algorithm based on a fully supervised self-attention mechanism (SA-EEND).
1 code implementation • 2 Jun 2020 • Yusuke Fujita, Shinji Watanabe, Shota Horiguchi, Yawen Xue, Jing Shi, Kenji Nagamatsu
Speaker diarization is an essential step for processing multi-speaker audio.
3 code implementations • 20 May 2020 • Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Yawen Xue, Kenji Nagamatsu
End-to-end speaker diarization for an unknown number of speakers is addressed in this paper.
1 code implementation • 24 Feb 2020 • Yusuke Fujita, Shinji Watanabe, Shota Horiguchi, Yawen Xue, Kenji Nagamatsu
However, the clustering-based approach has a number of problems; i. e., (i) it is not optimized to minimize diarization errors directly, (ii) it cannot handle speaker overlaps correctly, and (iii) it has trouble adapting their speaker embedding models to real audio recordings with speaker overlaps.
no code implementations • 6 Nov 2019 • Takuya Fujioka, Dario Bertero, Takeshi Homma, Kenji Nagamatsu
We therefore propose a dynamic label correction and sample contribution weight estimation model.
no code implementations • 17 Sep 2019 • Naoyuki Kanda, Shota Horiguchi, Yusuke Fujita, Yawen Xue, Kenji Nagamatsu, Shinji Watanabe
Our proposed method combined with i-vector speaker embeddings ultimately achieved a WER that differed by only 2. 1 % from that of TS-ASR given oracle speaker embeddings.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
2 code implementations • 13 Sep 2019 • Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Yawen Xue, Kenji Nagamatsu, Shinji Watanabe
Our method was even better than that of the state-of-the-art x-vector clustering-based method.
Ranked #2 on Speaker Diarization on CALLHOME
1 code implementation • 12 Sep 2019 • Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Kenji Nagamatsu, Shinji Watanabe
To realize such a model, we formulate the speaker diarization problem as a multi-label classification problem, and introduces a permutation-free objective function to directly minimize diarization errors without being suffered from the speaker-label permutation problem.
Ranked #6 on Speaker Diarization on CALLHOME
no code implementations • 26 Jun 2019 • Naoyuki Kanda, Shota Horiguchi, Ryoichi Takashima, Yusuke Fujita, Kenji Nagamatsu, Shinji Watanabe
In this paper, we propose a novel auxiliary loss function for target-speaker automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 29 May 2019 • Naoyuki Kanda, Christoph Boeddeker, Jens Heitkaemper, Yusuke Fujita, Shota Horiguchi, Kenji Nagamatsu, Reinhold Haeb-Umbach
In this paper, we present Hitachi and Paderborn University's joint effort for automatic speech recognition (ASR) in a dinner party scenario.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3