Search Results for author: Shoko Araki

Found 26 papers, 4 papers with code

Probing Self-supervised Learning Models with Target Speech Extraction

no code implementations • 17 Feb 2024 • Junyi Peng, Marc Delcroix, Tsubasa Ochiai, Oldrich Plchot, Takanori Ashihara, Shoko Araki, Jan Cernocky

TSE uniquely requires both speaker identification and speech separation, distinguishing it from other tasks in the Speech processing Universal PERformance Benchmark (SUPERB) evaluation.

Self-Supervised Learning Speaker Identification +2

Paper
Add Code

Target Speech Extraction with Pre-trained Self-supervised Learning Models

no code implementations • 17 Feb 2024 • Junyi Peng, Marc Delcroix, Tsubasa Ochiai, Oldrich Plchot, Shoko Araki, Jan Cernocky

We then extend a powerful TSE architecture by incorporating two SSL-based modules: an Adaptive Input Enhancer (AIE) and a speaker encoder.

Self-Supervised Learning Speech Extraction

Paper
Add Code

Array Geometry-Robust Attention-Based Neural Beamformer for Moving Speakers

no code implementations • 5 Feb 2024 • Marvin Tammen, Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani, Shoko Araki, Simon Doclo

Recently, a mask-based beamformer with attention-based spatial covariance matrix aggregator (ASA) was proposed, which was demonstrated to track moving sources accurately.

Paper
Add Code

Lattice Rescoring Based on Large Ensemble of Complementary Neural Language Models

no code implementations • 20 Dec 2023 • Atsunori Ogawa, Naohiro Tawara, Marc Delcroix, Shoko Araki

We investigate the effectiveness of using a large ensemble of advanced neural language models (NLMs) for lattice rescoring on automatic speech recognition (ASR) hypotheses.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Neural network-based virtual microphone estimation with virtual microphone and beamformer-level multi-task loss

no code implementations • 20 Nov 2023 • Hanako Segawa, Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani, Rintaro Ikeshita, Shoko Araki, Takeshi Yamada, Shoji Makino

However, this training objective may not be optimal for a specific array processing back-end, such as beamforming.

Paper
Add Code

How does end-to-end speech recognition training impact speech enhancement artifacts?

no code implementations • 20 Nov 2023 • Kazuma Iwamoto, Tsubasa Ochiai, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki, Shigeru Katagiri

Jointly training a speech enhancement (SE) front-end and an automatic speech recognition (ASR) back-end has been investigated as a way to mitigate the influence of \emph{processing distortion} generated by single-channel SE on ASR.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Modified Parametric Multichannel Wiener Filter \\for Low-latency Enhancement of Speech Mixtures with Unknown Number of Speakers

no code implementations • 29 Jun 2023 • Ning Guo, Tomohiro Nakatani, Shoko Araki, Takehiro Moriya

This paper introduces a novel low-latency online beamforming (BF) algorithm, named Modified Parametric Multichannel Wiener Filter (Mod-PMWF), for enhancing speech mixtures with unknown and varying number of speakers.

Low-latency processing

Paper
Add Code

Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization

no code implementations • 23 May 2023 • Marc Delcroix, Naohiro Tawara, Mireia Diez, Federico Landini, Anna Silnova, Atsunori Ogawa, Tomohiro Nakatani, Lukas Burget, Shoko Araki

Combining end-to-end neural speaker diarization (EEND) with vector clustering (VC), known as EEND-VC, has gained interest for leveraging the strengths of both methods.

Clustering speaker-diarization +1

Paper
Add Code

ConceptBeam: Concept Driven Target Speech Extraction

no code implementations • 25 Jul 2022 • Yasunori Ohishi, Marc Delcroix, Tsubasa Ochiai, Shoko Araki, Daiki Takeuchi, Daisuke Niizumi, Akisato Kimura, Noboru Harada, Kunio Kashino

We use it to bridge modality-dependent information, i. e., the speech segments in the mixture, and the specified, modality-independent concept.

Metric Learning Speech Extraction

Paper
Add Code

Mask-based Neural Beamforming for Moving Speakers with Self-Attention-based Tracking

no code implementations • 7 May 2022 • Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani, Shoko Araki

We thus introduce a learning-based framework that computes optimal attention weights for beamforming.

Paper
Add Code

SoundBeam: Target sound extraction conditioned on sound-class labels and enrollment clues for increased performance and continuous learning

no code implementations • 8 Apr 2022 • Marc Delcroix, Jorge Bennasar Vázquez, Tsubasa Ochiai, Keisuke Kinoshita, Yasunori Ohishi, Shoko Araki

We can achieve this with a neural network that extracts the target SEs by conditioning it on clues representing the target SE classes.

Target Sound Extraction

Paper
Add Code

How Bad Are Artifacts?: Analyzing the Impact of Speech Enhancement Errors on ASR

no code implementations • 18 Jan 2022 • Kazuma Iwamoto, Tsubasa Ochiai, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki, Shigeru Katagiri

The artifact component is defined as the SE error signal that cannot be represented as a linear combination of speech and noise sources.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Switching Independent Vector Analysis and Its Extension to Blind and Spatially Guided Convolutional Beamforming Algorithms

no code implementations • 20 Nov 2021 • Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Hiroshi Sawada, Naoyuki Kamo, Shoko Araki

This paper develops a framework that can perform denoising, dereverberation, and source separation accurately by using a relatively small number of microphones.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Blind and neural network-guided convolutional beamformer for joint denoising, dereverberation, and source separation

no code implementations • 4 Aug 2021 • Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Hiroshi Sawada, Shoko Araki

This paper proposes an approach for optimizing a Convolutional BeamFormer (CBF) that can jointly perform denoising (DN), dereverberation (DR), and source separation (SS).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Few-shot learning of new sound classes for target sound extraction

no code implementations • 14 Jun 2021 • Marc Delcroix, Jorge Bennasar Vázquez, Tsubasa Ochiai, Keisuke Kinoshita, Shoko Araki

Target sound extraction consists of extracting the sound of a target acoustic event (AE) class from a mixture of AE sounds.

Few-Shot Learning Target Sound Extraction

Paper
Add Code

PILOT: Introducing Transformers for Probabilistic Sound Event Localization

1 code implementation • 7 Jun 2021 • Christopher Schymura, Benedikt Bönninghoff, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Dorothea Kolossa

Sound event localization aims at estimating the positions of sound sources in the environment with respect to an acoustic receiver (e. g. a microphone array).

Event Detection

Paper
Code

Comparison of remote experiments using crowdsourcing and laboratory experiments on speech intelligibility

no code implementations • 17 Apr 2021 • Ayako Yamamoto, Toshio Irino, Kenichi Arai, Shoko Araki, Atsunori Ogawa, Keisuke Kinoshita, Tomohiro Nakatani

Many subjective experiments have been performed to develop objective speech intelligibility measures, but the novel coronavirus outbreak has made it very difficult to conduct experiments in a laboratory.

Speech Enhancement

Paper
Add Code

Exploiting Attention-based Sequence-to-Sequence Architectures for Sound Event Localization

1 code implementation • 28 Feb 2021 • Christopher Schymura, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Dorothea Kolossa

Herein, attentions allow for capturing temporal dependencies in the audio signal by focusing on specific frames that are relevant for estimating the activity and direction-of-arrival of sound events at the current time-step.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Code

Data Fusion for Audiovisual Speaker Localization: Extending Dynamic Stream Weights to the Spatial Domain

1 code implementation • 23 Feb 2021 • Julio Wissing, Benedikt Boenninghoff, Dorothea Kolossa, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Christopher Schymura

Estimating the positions of multiple speakers can be helpful for tasks like automatic speech recognition or speaker diarization.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Code

Multimodal Attention Fusion for Target Speaker Extraction

no code implementations • 2 Feb 2021 • Hiroshi Sato, Tsubasa Ochiai, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani, Shoko Araki

Recently an audio-visual target speaker extraction has been proposed that extracts target speech by using complementary audio and visual clues.

Target Speaker Extraction

Paper
Add Code

Neural Network-based Virtual Microphone Estimator

no code implementations • 12 Jan 2021 • Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Shoko Araki

Developing microphone array technologies for a small number of microphones is important due to the constraints of many devices.

Speech Enhancement

Paper
Add Code

Block Coordinate Descent Algorithms for Auxiliary-Function-Based Independent Vector Extraction

no code implementations • 18 Oct 2020 • Rintaro Ikeshita, Tomohiro Nakatani, Shoko Araki

We also newly develop a BCD for a semiblind IVE in which the transfer functions for several super-Gaussian sources are given a priori.

Paper
Add Code

Listen to What You Want: Neural Network-based Universal Sound Selector

no code implementations • 10 Jun 2020 • Tsubasa Ochiai, Marc Delcroix, Yuma Koizumi, Hiroaki Ito, Keisuke Kinoshita, Shoko Araki

In this paper, we propose instead a universal sound selection neural network that enables to directly select AE sounds from a mixture given user-specified target AE classes.

Paper
Add Code

Tackling real noisy reverberant meetings with all-neural source separation, counting, and diarization system

no code implementations • 9 Mar 2020 • Keisuke Kinoshita, Marc Delcroix, Shoko Araki, Tomohiro Nakatani

Automatic meeting analysis is an essential fundamental technology required to let, e. g. smart devices follow and respond to our conversations.

speaker-diarization Speaker Diarization +1

Paper
Add Code

Improving speaker discrimination of target speech extraction with time-domain SpeakerBeam

1 code implementation • 23 Jan 2020 • Marc Delcroix, Tsubasa Ochiai, Katerina Zmolikova, Keisuke Kinoshita, Naohiro Tawara, Tomohiro Nakatani, Shoko Araki

First, we propose a time-domain implementation of SpeakerBeam similar to that proposed for a time-domain audio separation network (TasNet), which has achieved state-of-the-art performance for speech separation.

Speaker Identification Speech Extraction

Paper
Code

All-neural online source separation, counting, and diarization for meeting analysis

no code implementations • 21 Feb 2019 • Thilo von Neumann, Keisuke Kinoshita, Marc Delcroix, Shoko Araki, Tomohiro Nakatani, Reinhold Haeb-Umbach

While significant progress has been made on individual tasks, this paper presents for the first time an all-neural approach to simultaneous speaker counting, diarization and source separation.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.