Search Results for author: Shota Horiguchi

Found 30 papers, 8 papers with code

Spoofing Attacker Also Benefits from Self-Supervised Pretrained Model

no code implementations24 May 2023 Aoi Ito, Shota Horiguchi

Large-scale pretrained models using self-supervised learning have reportedly improved the performance of speech anti-spoofing.

Self-Supervised Learning

Mutual Learning of Single- and Multi-Channel End-to-End Neural Diarization

no code implementations7 Oct 2022 Shota Horiguchi, Yuki Takashima, Shinji Watanabe, Paola Garcia

This paper focuses on speaker diarization and proposes to conduct the above bi-directional knowledge transfer alternately.

Knowledge Distillation speaker-diarization +2

Updating Only Encoders Prevents Catastrophic Forgetting of End-to-End ASR Models

no code implementations1 Jul 2022 Yuki Takashima, Shota Horiguchi, Shinji Watanabe, Paola García, Yohei Kawaguchi

In this paper, we present an incremental domain adaptation technique to prevent catastrophic forgetting for an end-to-end automatic speech recognition (ASR) model.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Rethinking Fano's Inequality in Ensemble Learning

1 code implementation25 May 2022 Terufumi Morishita, Gaku Morio, Shota Horiguchi, Hiroaki Ozaki, Nobuo Nukaga

We propose a fundamental theory on ensemble learning that answers the central question: what factors make an ensemble system good or bad?

Ensemble Learning

Improving the Naturalness of Simulated Conversations for End-to-End Neural Diarization

1 code implementation24 Apr 2022 Natsuo Yamashita, Shota Horiguchi, Takeshi Homma

Due to the lack of any annotated real conversational dataset, EEND is usually pretrained on a large-scale simulated conversational dataset first and then adapted to the target real dataset.

speaker-diarization Speaker Diarization

Environmental Sound Extraction Using Onomatopoeic Words

no code implementations1 Dec 2021 Yuki Okamoto, Shota Horiguchi, Masaaki Yamamoto, Keisuke Imoto, Yohei Kawaguchi

An onomatopoeic word, which is a character sequence that phonetically imitates a sound, is effective in expressing characteristics of sound such as duration, pitch, and timbre.

Multi-Channel End-to-End Neural Diarization with Distributed Microphones

no code implementations10 Oct 2021 Shota Horiguchi, Yuki Takashima, Paola Garcia, Shinji Watanabe, Yohei Kawaguchi

With simulated and real-recorded datasets, we demonstrated that the proposed method outperformed conventional EEND when a multi-channel input was given while maintaining comparable performance with a single-channel input.

speaker-diarization Speaker Diarization

Towards Neural Diarization for Unlimited Numbers of Speakers Using Global and Local Attractors

no code implementations4 Jul 2021 Shota Horiguchi, Shinji Watanabe, Paola Garcia, Yawen Xue, Yuki Takashima, Yohei Kawaguchi

This makes it possible to produce diarization results of a large number of speakers for the whole recording even if the number of output speakers for each subsequence is limited.


Online Streaming End-to-End Neural Diarization Handling Overlapping Speech and Flexible Numbers of Speakers

no code implementations21 Jan 2021 Yawen Xue, Shota Horiguchi, Yusuke Fujita, Yuki Takashima, Shinji Watanabe, Paola Garcia, Kenji Nagamatsu

We propose a streaming diarization method based on an end-to-end neural diarization (EEND) model, which handles flexible numbers of speakers and overlapping speech.

Speaker Diarization Sound Audio and Speech Processing

End-to-End Speaker Diarization as Post-Processing

no code implementations18 Dec 2020 Shota Horiguchi, Paola Garcia, Yusuke Fujita, Shinji Watanabe, Kenji Nagamatsu

Clustering-based diarization methods partition frames into clusters of the number of speakers; thus, they typically cannot handle overlapping speech because each frame is assigned to one speaker.

Clustering Multi-Label Classification +2

Block-Online Guided Source Separation

no code implementations16 Nov 2020 Shota Horiguchi, Yusuke Fujita, Kenji Nagamatsu

It is also a problem that the offline GSS is an utterance-wise algorithm so that it produces latency according to the length of the utterance.

Speech Separation

Utterance-Wise Meeting Transcription System Using Asynchronous Distributed Microphones

no code implementations31 Jul 2020 Shota Horiguchi, Yusuke Fujita, Kenji Nagamatsu

We also showed that our framework achieved CER of 21. 8 %, which is only 2. 1 percentage points higher than the CER in headset microphone-based transcription.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Online End-to-End Neural Diarization with Speaker-Tracing Buffer

no code implementations4 Jun 2020 Yawen Xue, Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Kenji Nagamatsu

This paper proposes a novel online speaker diarization algorithm based on a fully supervised self-attention mechanism (SA-EEND).

speaker-diarization Speaker Diarization

End-to-End Neural Diarization: Reformulating Speaker Diarization as Simple Multi-label Classification

1 code implementation24 Feb 2020 Yusuke Fujita, Shinji Watanabe, Shota Horiguchi, Yawen Xue, Kenji Nagamatsu

However, the clustering-based approach has a number of problems; i. e., (i) it is not optimized to minimize diarization errors directly, (ii) it cannot handle speaker overlaps correctly, and (iii) it has trouble adapting their speaker embedding models to real audio recordings with speaker overlaps.

Clustering General Classification +3

End-to-End Neural Speaker Diarization with Permutation-Free Objectives

1 code implementation12 Sep 2019 Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Kenji Nagamatsu, Shinji Watanabe

To realize such a model, we formulate the speaker diarization problem as a multi-label classification problem, and introduces a permutation-free objective function to directly minimize diarization errors without being suffered from the speaker-label permutation problem.

Clustering Domain Adaptation +3

Personalized Classifier for Food Image Recognition

no code implementations8 Apr 2018 Shota Horiguchi, Sosuke Amano, Makoto Ogawa, Kiyoharu Aizawa

In this paper, we address the personalization problem, which involves adapting to the user's domain incrementally using a very limited number of samples.

Significance of Softmax-based Features in Comparison to Distance Metric Learning-based Features

no code implementations29 Dec 2017 Shota Horiguchi, Daiki Ikami, Kiyoharu Aizawa

However, in these DML studies, there were no equitable comparisons between features extracted from a DML-based network and those from a softmax-based network.

Metric Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.