Search Results for author: Naohiro Tawara

Found 9 papers, 3 papers with code

BLSTM-Based Confidence Estimation for End-to-End Speech Recognition

no code implementations • 22 Dec 2023 • Atsunori Ogawa, Naohiro Tawara, Takatomo Kano, Marc Delcroix

Confidence estimation, in which we estimate the reliability of each recognized token (e. g., word, sub-word, and character) in automatic speech recognition (ASR) hypotheses and detect incorrectly recognized tokens, is an important function for developing ASR applications.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Lattice Rescoring Based on Large Ensemble of Complementary Neural Language Models

no code implementations • 20 Dec 2023 • Atsunori Ogawa, Naohiro Tawara, Marc Delcroix, Shoko Araki

We investigate the effectiveness of using a large ensemble of advanced neural language models (NLMs) for lattice rescoring on automatic speech recognition (ASR) hypotheses.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Iterative Shallow Fusion of Backward Language Model for End-to-End Speech Recognition

no code implementations • 17 Oct 2023 • Atsunori Ogawa, Takafumi Moriya, Naoyuki Kamo, Naohiro Tawara, Marc Delcroix

In experiments using an attention-based encoder-decoder ASR system, we confirmed that ISF using the PBLM shows comparable performance with SF using the FLM.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Discriminative Training of VBx Diarization

1 code implementation • 4 Oct 2023 • Dominik Klement, Mireia Diez, Federico Landini, Lukáš Burget, Anna Silnova, Marc Delcroix, Naohiro Tawara

Bayesian HMM clustering of x-vector sequences (VBx) has become a widely adopted diarization baseline model in publications and challenges.

Bayesian Inference

Paper
Code

NTT speaker diarization system for CHiME-7: multi-domain, multi-microphone End-to-end and vector clustering diarization

no code implementations • 22 Sep 2023 • Naohiro Tawara, Marc Delcroix, Atsushi Ando, Atsunori Ogawa

This paper details our speaker diarization system designed for multi-domain, multi-microphone casual conversations.

Automatic Speech Recognition speaker-diarization +3

Paper
Add Code

Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization

no code implementations • 23 May 2023 • Marc Delcroix, Naohiro Tawara, Mireia Diez, Federico Landini, Anna Silnova, Atsunori Ogawa, Tomohiro Nakatani, Lukas Burget, Shoko Araki

Combining end-to-end neural speaker diarization (EEND) with vector clustering (VC), known as EEND-VC, has gained interest for leveraging the strengths of both methods.

Clustering speaker-diarization +1

Paper
Add Code

Advances in integration of end-to-end neural and clustering-based diarization for real conversational speech

1 code implementation • 19 May 2021 • Keisuke Kinoshita, Marc Delcroix, Naohiro Tawara

This paper is to (1) report recent advances we made to this framework, including newly introduced robust constrained clustering algorithms, and (2) experimentally show that the method can now significantly outperform competitive diarization methods such as Encoder-Decoder Attractor (EDA)-EEND, on CALLHOME data which comprises real conversational speech data including overlapped speech and an arbitrary number of speakers.

Constrained Clustering speaker-diarization +1

Paper
Code

Integrating end-to-end neural and clustering-based diarization: Getting the best of both worlds

no code implementations • 26 Oct 2020 • Keisuke Kinoshita, Marc Delcroix, Naohiro Tawara

In this paper, we propose a simple but effective hybrid diarization framework that works with overlapped speech and for long recordings containing an arbitrary number of speakers.

Clustering

Paper
Add Code

Improving speaker discrimination of target speech extraction with time-domain SpeakerBeam

1 code implementation • 23 Jan 2020 • Marc Delcroix, Tsubasa Ochiai, Katerina Zmolikova, Keisuke Kinoshita, Naohiro Tawara, Tomohiro Nakatani, Shoko Araki

First, we propose a time-domain implementation of SpeakerBeam similar to that proposed for a time-domain audio separation network (TasNet), which has achieved state-of-the-art performance for speech separation.

Speaker Identification Speech Extraction

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.