Search Results for author: Naohiro Tawara

Found 12 papers, 3 papers with code

BLSTM-Based Confidence Estimation for End-to-End Speech Recognition

no code implementations22 Dec 2023 Atsunori Ogawa, Naohiro Tawara, Takatomo Kano, Marc Delcroix

Confidence estimation, in which we estimate the reliability of each recognized token (e. g., word, sub-word, and character) in automatic speech recognition (ASR) hypotheses and detect incorrectly recognized tokens, is an important function for developing ASR applications.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Lattice Rescoring Based on Large Ensemble of Complementary Neural Language Models

no code implementations20 Dec 2023 Atsunori Ogawa, Naohiro Tawara, Marc Delcroix, Shoko Araki

We investigate the effectiveness of using a large ensemble of advanced neural language models (NLMs) for lattice rescoring on automatic speech recognition (ASR) hypotheses.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Iterative Shallow Fusion of Backward Language Model for End-to-End Speech Recognition

no code implementations17 Oct 2023 Atsunori Ogawa, Takafumi Moriya, Naoyuki Kamo, Naohiro Tawara, Marc Delcroix

In experiments using an attention-based encoder-decoder ASR system, we confirmed that ISF using the PBLM shows comparable performance with SF using the FLM.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Discriminative Training of VBx Diarization

1 code implementation4 Oct 2023 Dominik Klement, Mireia Diez, Federico Landini, Lukáš Burget, Anna Silnova, Marc Delcroix, Naohiro Tawara

Bayesian HMM clustering of x-vector sequences (VBx) has become a widely adopted diarization baseline model in publications and challenges.

Bayesian Inference

Advances in integration of end-to-end neural and clustering-based diarization for real conversational speech

1 code implementation19 May 2021 Keisuke Kinoshita, Marc Delcroix, Naohiro Tawara

This paper is to (1) report recent advances we made to this framework, including newly introduced robust constrained clustering algorithms, and (2) experimentally show that the method can now significantly outperform competitive diarization methods such as Encoder-Decoder Attractor (EDA)-EEND, on CALLHOME data which comprises real conversational speech data including overlapped speech and an arbitrary number of speakers.

Constrained Clustering Decoder +2

Integrating end-to-end neural and clustering-based diarization: Getting the best of both worlds

no code implementations26 Oct 2020 Keisuke Kinoshita, Marc Delcroix, Naohiro Tawara

In this paper, we propose a simple but effective hybrid diarization framework that works with overlapped speech and for long recordings containing an arbitrary number of speakers.

Clustering

Improving speaker discrimination of target speech extraction with time-domain SpeakerBeam

1 code implementation23 Jan 2020 Marc Delcroix, Tsubasa Ochiai, Katerina Zmolikova, Keisuke Kinoshita, Naohiro Tawara, Tomohiro Nakatani, Shoko Araki

First, we propose a time-domain implementation of SpeakerBeam similar to that proposed for a time-domain audio separation network (TasNet), which has achieved state-of-the-art performance for speech separation.

Speaker Identification Speech Extraction

Cannot find the paper you are looking for? You can Submit a new open access paper.