Search Results for author: Hiroshi Sato

Found 15 papers, 1 papers with code

On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis

1 code implementation • 28 Oct 2022 • Atsushi Ando, Ryo Masumura, Akihiko Takashima, Satoshi Suzuki, Naoki Makishima, Keita Suzuki, Takafumi Moriya, Takanori Ashihara, Hiroshi Sato

This paper investigates the effectiveness and implementation of modality-specific large-scale pre-trained encoders for multimodal sentiment analysis~(MSA).

Multimodal Sentiment Analysis

Paper
Code

Multimodal Attention Fusion for Target Speaker Extraction

no code implementations • 2 Feb 2021 • Hiroshi Sato, Tsubasa Ochiai, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani, Shoko Araki

Recently an audio-visual target speaker extraction has been proposed that extracts target speech by using complementary audio and visual clues.

Target Speaker Extraction

Paper
Add Code

Should We Always Separate?: Switching Between Enhanced and Observed Signals for Overlapping Speech Recognition

no code implementations • 2 Jun 2021 • Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Takafumi Moriya, Naoyuki Kamo

', we analyze ASR performance on observed and enhanced speech at various noise and interference conditions, and show that speech enhancement degrades ASR under some conditions even for overlapping speech.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Learning to Enhance or Not: Neural Network-Based Switching of Enhanced and Observed Signals for Overlapping Speech Recognition

no code implementations • 11 Jan 2022 • Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Naoyuki Kamo, Takafumi Moriya

To mitigate the degradation, we introduced a rule-based method to switch the ASR input between the enhanced and observed signals, which showed promising results.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

How Bad Are Artifacts?: Analyzing the Impact of Speech Enhancement Errors on ASR

no code implementations • 18 Jan 2022 • Kazuma Iwamoto, Tsubasa Ochiai, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki, Shigeru Katagiri

The artifact component is defined as the SE error signal that cannot be represented as a linear combination of speech and noise sources.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Listen only to me! How well can target speech extraction handle false alarms?

no code implementations • 11 Apr 2022 • Marc Delcroix, Keisuke Kinoshita, Tsubasa Ochiai, Katerina Zmolikova, Hiroshi Sato, Tomohiro Nakatani

Target speech extraction (TSE) extracts the speech of a target speaker in a mixture given auxiliary clues characterizing the speaker, such as an enrollment utterance.

Speaker Identification Speaker Verification +2

Paper
Add Code

Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations

no code implementations • 16 Jun 2022 • Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Takafumi Moriya, Naoki Makishima, Mana Ihori, Tomohiro Tanaka, Ryo Masumura

Experimental validation reveals the effectiveness of both worst-enrollment target training and SI-loss training to improve robustness against enrollment variations, by increasing speaker discriminability.

Speaker Identification Speech Extraction

Paper
Add Code

Streaming Target-Speaker ASR with Neural Transducer

no code implementations • 9 Sep 2022 • Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Takahiro Shinozaki

We confirm in experiments that our TS-ASR achieves comparable recognition performance with conventional cascade systems in the offline setting, while reducing computation costs and realizing streaming TS-ASR.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Multi-Perspective Document Revision

no code implementations • COLING 2022 • Mana Ihori, Hiroshi Sato, Tomohiro Tanaka, Ryo Masumura

To model the task, we design a novel Japanese multi-perspective document revision dataset that simultaneously handles seven perspectives to improve the readability and clarity of a document.

Grammatical Error Correction Relation Classification +1

Paper
Add Code

Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss

no code implementations • 24 May 2023 • Hiroshi Sato, Ryo Masumura, Tsubasa Ochiai, Marc Delcroix, Takafumi Moriya, Takanori Ashihara, Kentaro Shinayama, Saki Mizuno, Mana Ihori, Tomohiro Tanaka, Nobukatsu Hojo

In this work, we propose a new SE training criterion that minimizes the distance between clean and enhanced signals in the feature representation of the SSL model to alleviate the mismatch.

Self-Supervised Learning Speech Enhancement

Paper
Add Code

Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data

no code implementations • 25 May 2023 • Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Takanori Ashihara, Kohei Matsuura, Tomohiro Tanaka, Ryo Masumura, Atsunori Ogawa, Taichi Asami

Neural transducer (RNNT)-based target-speaker speech recognition (TS-RNNT) directly transcribes a target speaker's voice from a multi-talker mixture.

Knowledge Distillation Speech Extraction +2

Paper
Add Code

Improving Scheduled Sampling for Neural Transducer-based ASR

no code implementations • 25 May 2023 • Takafumi Moriya, Takanori Ashihara, Hiroshi Sato, Kohei Matsuura, Tomohiro Tanaka, Ryo Masumura

Experiments in three datasets confirm that RNNT trained with our SS approach achieves the best ASR performance.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

End-to-End Joint Target and Non-Target Speakers ASR

no code implementations • 4 Jun 2023 • Ryo Masumura, Naoki Makishima, Taiga Yamane, Yoshihiko Yamazaki, Saki Mizuno, Mana Ihori, Mihiro Uchida, Keita Suzuki, Hiroshi Sato, Tomohiro Tanaka, Akihiko Takashima, Satoshi Suzuki, Takafumi Moriya, Nobukatsu Hojo, Atsushi Ando

Target-speaker ASR systems are a promising way to only transcribe a target speaker's speech by enrolling the target speaker's information.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

How does end-to-end speech recognition training impact speech enhancement artifacts?

no code implementations • 20 Nov 2023 • Kazuma Iwamoto, Tsubasa Ochiai, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki, Shigeru Katagiri

Jointly training a speech enhancement (SE) front-end and an automatic speech recognition (ASR) back-end has been investigated as a way to mitigate the influence of \emph{processing distortion} generated by single-channel SE on ASR.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Noise-robust zero-shot text-to-speech synthesis conditioned on self-supervised speech-representation model with adapters

no code implementations • 10 Jan 2024 • Kenichi Fujita, Hiroshi Sato, Takanori Ashihara, Hiroki Kanagawa, Marc Delcroix, Takafumi Moriya, Yusuke Ijima

The zero-shot text-to-speech (TTS) method, based on speaker embeddings extracted from reference speech using self-supervised learning (SSL) speech representations, can reproduce speaker characteristics very accurately.

Self-Supervised Learning Speech Enhancement +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.