Search Results for author: Hiroshi Sato

Found 9 papers, 1 papers with code

Multi-Perspective Document Revision

no code implementations COLING 2022 Mana Ihori, Hiroshi Sato, Tomohiro Tanaka, Ryo Masumura

To model the task, we design a novel Japanese multi-perspective document revision dataset that simultaneously handles seven perspectives to improve the readability and clarity of a document.

Grammatical Error Correction Relation Classification

On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis

1 code implementation28 Oct 2022 Atsushi Ando, Ryo Masumura, Akihiko Takashima, Satoshi Suzuki, Naoki Makishima, Keita Suzuki, Takafumi Moriya, Takanori Ashihara, Hiroshi Sato

This paper investigates the effectiveness and implementation of modality-specific large-scale pre-trained encoders for multimodal sentiment analysis~(MSA).

Multimodal Sentiment Analysis

Streaming Target-Speaker ASR with Neural Transducer

no code implementations9 Sep 2022 Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Takahiro Shinozaki

We confirm in experiments that our TS-ASR achieves comparable recognition performance with conventional cascade systems in the offline setting, while reducing computation costs and realizing streaming TS-ASR.

Automatic Speech Recognition Speech Extraction +1

Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations

no code implementations16 Jun 2022 Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Takafumi Moriya, Naoki Makishima, Mana Ihori, Tomohiro Tanaka, Ryo Masumura

Experimental validation reveals the effectiveness of both worst-enrollment target training and SI-loss training to improve robustness against enrollment variations, by increasing speaker discriminability.

Speaker Identification Speech Extraction

Listen only to me! How well can target speech extraction handle false alarms?

no code implementations11 Apr 2022 Marc Delcroix, Keisuke Kinoshita, Tsubasa Ochiai, Katerina Zmolikova, Hiroshi Sato, Tomohiro Nakatani

Target speech extraction (TSE) extracts the speech of a target speaker in a mixture given auxiliary clues characterizing the speaker, such as an enrollment utterance.

Speaker Identification Speaker Verification +2

Should We Always Separate?: Switching Between Enhanced and Observed Signals for Overlapping Speech Recognition

no code implementations2 Jun 2021 Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Takafumi Moriya, Naoyuki Kamo

', we analyze ASR performance on observed and enhanced speech at various noise and interference conditions, and show that speech enhancement degrades ASR under some conditions even for overlapping speech.

Automatic Speech Recognition Speech Enhancement +2

Multimodal Attention Fusion for Target Speaker Extraction

no code implementations2 Feb 2021 Hiroshi Sato, Tsubasa Ochiai, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani, Shoko Araki

Recently an audio-visual target speaker extraction has been proposed that extracts target speech by using complementary audio and visual clues.

Target Speaker Extraction

Cannot find the paper you are looking for? You can Submit a new open access paper.