Search Results for author: Tsubasa Ochiai

Found 28 papers, 7 papers with code

Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss

no code implementations24 May 2023 Hiroshi Sato, Ryo Masumura, Tsubasa Ochiai, Marc Delcroix, Takafumi Moriya, Takanori Ashihara, Kentaro Shinayama, Saki Mizuno, Mana Ihori, Tomohiro Tanaka, Nobukatsu Hojo

In this work, we propose a new SE training criterion that minimizes the distance between clean and enhanced signals in the feature representation of the SSL model to alleviate the mismatch.

Self-Supervised Learning Speech Enhancement

Neural Target Speech Extraction: An Overview

1 code implementation31 Jan 2023 Katerina Zmolikova, Marc Delcroix, Tsubasa Ochiai, Keisuke Kinoshita, Jan Černocký, Dong Yu

Humans can listen to a target speaker even in challenging acoustic conditions that have noise, reverberation, and interfering speakers.

Speech Extraction

Streaming Target-Speaker ASR with Neural Transducer

no code implementations9 Sep 2022 Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Takahiro Shinozaki

We confirm in experiments that our TS-ASR achieves comparable recognition performance with conventional cascade systems in the offline setting, while reducing computation costs and realizing streaming TS-ASR.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Analysis of impact of emotions on target speech extraction and speech separation

1 code implementation15 Aug 2022 Ján Švec, Kateřina Žmolíková, Martin Kocour, Marc Delcroix, Tsubasa Ochiai, Ladislav Mošner, Jan Černocký

One of the factors causing such degradation may be intrinsic speaker variability, such as emotions, occurring commonly in realistic speech.

Speaker Verification Speech Extraction

ConceptBeam: Concept Driven Target Speech Extraction

no code implementations25 Jul 2022 Yasunori Ohishi, Marc Delcroix, Tsubasa Ochiai, Shoko Araki, Daiki Takeuchi, Daisuke Niizumi, Akisato Kimura, Noboru Harada, Kunio Kashino

We use it to bridge modality-dependent information, i. e., the speech segments in the mixture, and the specified, modality-independent concept.

Metric Learning Speech Extraction

Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations

no code implementations16 Jun 2022 Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Takafumi Moriya, Naoki Makishima, Mana Ihori, Tomohiro Tanaka, Ryo Masumura

Experimental validation reveals the effectiveness of both worst-enrollment target training and SI-loss training to improve robustness against enrollment variations, by increasing speaker discriminability.

Speaker Identification Speech Extraction

Mask-based Neural Beamforming for Moving Speakers with Self-Attention-based Tracking

no code implementations7 May 2022 Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani, Shoko Araki

We thus introduce a learning-based framework that computes optimal attention weights for beamforming.

Listen only to me! How well can target speech extraction handle false alarms?

no code implementations11 Apr 2022 Marc Delcroix, Keisuke Kinoshita, Tsubasa Ochiai, Katerina Zmolikova, Hiroshi Sato, Tomohiro Nakatani

Target speech extraction (TSE) extracts the speech of a target speaker in a mixture given auxiliary clues characterizing the speaker, such as an enrollment utterance.

Speaker Identification Speaker Verification +2

Revisiting joint decoding based multi-talker speech recognition with DNN acoustic model

1 code implementation31 Oct 2021 Martin Kocour, Kateřina Žmolíková, Lucas Ondel, Ján Švec, Marc Delcroix, Tsubasa Ochiai, Lukáš Burget, Jan Černocký

We modify the acoustic model to predict joint state posteriors for all speakers, enabling the network to express uncertainty about the attribution of parts of the speech signal to the speakers.

speech-recognition Speech Recognition

Few-shot learning of new sound classes for target sound extraction

no code implementations14 Jun 2021 Marc Delcroix, Jorge Bennasar Vázquez, Tsubasa Ochiai, Keisuke Kinoshita, Shoko Araki

Target sound extraction consists of extracting the sound of a target acoustic event (AE) class from a mixture of AE sounds.

Few-Shot Learning Target Sound Extraction

PILOT: Introducing Transformers for Probabilistic Sound Event Localization

1 code implementation7 Jun 2021 Christopher Schymura, Benedikt Bönninghoff, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Dorothea Kolossa

Sound event localization aims at estimating the positions of sound sources in the environment with respect to an acoustic receiver (e. g. a microphone array).

Event Detection

Should We Always Separate?: Switching Between Enhanced and Observed Signals for Overlapping Speech Recognition

no code implementations2 Jun 2021 Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Takafumi Moriya, Naoyuki Kamo

', we analyze ASR performance on observed and enhanced speech at various noise and interference conditions, and show that speech enhancement degrades ASR under some conditions even for overlapping speech.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Exploiting Attention-based Sequence-to-Sequence Architectures for Sound Event Localization

1 code implementation28 Feb 2021 Christopher Schymura, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Dorothea Kolossa

Herein, attentions allow for capturing temporal dependencies in the audio signal by focusing on specific frames that are relevant for estimating the activity and direction-of-arrival of sound events at the current time-step.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Multimodal Attention Fusion for Target Speaker Extraction

no code implementations2 Feb 2021 Hiroshi Sato, Tsubasa Ochiai, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani, Shoko Araki

Recently an audio-visual target speaker extraction has been proposed that extracts target speech by using complementary audio and visual clues.

Target Speaker Extraction

Speaker activity driven neural speech extraction

no code implementations14 Jan 2021 Marc Delcroix, Katerina Zmolikova, Tsubasa Ochiai, Keisuke Kinoshita, Tomohiro Nakatani

Target speech extraction, which extracts the speech of a target speaker in a mixture given auxiliary speaker clues, has recently received increased interest.

Speech Extraction

Neural Network-based Virtual Microphone Estimator

no code implementations12 Jan 2021 Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Shoko Araki

Developing microphone array technologies for a small number of microphones is important due to the constraints of many devices.

Speech Enhancement

Listen to What You Want: Neural Network-based Universal Sound Selector

no code implementations10 Jun 2020 Tsubasa Ochiai, Marc Delcroix, Yuma Koizumi, Hiroaki Ito, Keisuke Kinoshita, Shoko Araki

In this paper, we propose instead a universal sound selection neural network that enables to directly select AE sounds from a mixture given user-specified target AE classes.

Improving speaker discrimination of target speech extraction with time-domain SpeakerBeam

1 code implementation23 Jan 2020 Marc Delcroix, Tsubasa Ochiai, Katerina Zmolikova, Keisuke Kinoshita, Naohiro Tawara, Tomohiro Nakatani, Shoko Araki

First, we propose a time-domain implementation of SpeakerBeam similar to that proposed for a time-domain audio separation network (TasNet), which has achieved state-of-the-art performance for speech separation.

Speaker Identification Speech Extraction

Multichannel End-to-end Speech Recognition

no code implementations ICML 2017 Tsubasa Ochiai, Shinji Watanabe, Takaaki Hori, John R. Hershey

The field of speech recognition is in the midst of a paradigm shift: end-to-end neural networks are challenging the dominance of hidden Markov models as a core technology.

Language Modelling Speech Enhancement +2

Automatic Node Selection for Deep Neural Networks using Group Lasso Regularization

no code implementations17 Nov 2016 Tsubasa Ochiai, Shigeki Matsuda, Hideyuki Watanabe, Shigeru Katagiri

We examine the effect of the Group Lasso (gLasso) regularizer in selecting the salient nodes of Deep Neural Network (DNN) hidden layers by applying a DNN-HMM hybrid speech recognizer to TED Talks speech data.

General Classification Playing the Game of 2048

Cannot find the paper you are looking for? You can Submit a new open access paper.