Search Results for author: Ryo Masumura

Found 34 papers, 1 papers with code

Multi-Perspective Document Revision

no code implementations COLING 2022 Mana Ihori, Hiroshi Sato, Tomohiro Tanaka, Ryo Masumura

To model the task, we design a novel Japanese multi-perspective document revision dataset that simultaneously handles seven perspectives to improve the readability and clarity of a document.

Grammatical Error Correction Relation Classification +1

Multimodal Negotiation Corpus with Various Subjective Assessments for Social-Psychological Outcome Prediction from Non-Verbal Cues

no code implementations LREC 2022 Nobukatsu Hojo, Satoshi Kobashikawa, Saki Mizuno, Ryo Masumura

To investigate SPNOP, a corpus with various psychological measurements is beneficial because the interaction process of negotiation relates to many aspects of psychology.

Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss

no code implementations24 May 2023 Hiroshi Sato, Ryo Masumura, Tsubasa Ochiai, Marc Delcroix, Takafumi Moriya, Takanori Ashihara, Kentaro Shinayama, Saki Mizuno, Mana Ihori, Tomohiro Tanaka, Nobukatsu Hojo

In this work, we propose a new SE training criterion that minimizes the distance between clean and enhanced signals in the feature representation of the SSL model to alleviate the mismatch.

Self-Supervised Learning Speech Enhancement

Leveraging Large Text Corpora for End-to-End Speech Summarization

no code implementations2 Mar 2023 Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, Tomohiro Tanaka, Atsunori Ogawa, Marc Delcroix, Ryo Masumura

The first technique is to utilize a text-to-speech (TTS) system to generate synthesized speech, which is used for E2E SSum training with the text summary.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis

1 code implementation28 Oct 2022 Atsushi Ando, Ryo Masumura, Akihiko Takashima, Satoshi Suzuki, Naoki Makishima, Keita Suzuki, Takafumi Moriya, Takanori Ashihara, Hiroshi Sato

This paper investigates the effectiveness and implementation of modality-specific large-scale pre-trained encoders for multimodal sentiment analysis~(MSA).

Multimodal Sentiment Analysis

Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations

no code implementations16 Jun 2022 Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Takafumi Moriya, Naoki Makishima, Mana Ihori, Tomohiro Tanaka, Ryo Masumura

Experimental validation reveals the effectiveness of both worst-enrollment target training and SI-loss training to improve robustness against enrollment variations, by increasing speaker discriminability.

Speaker Identification Speech Extraction

Audio Visual Scene-Aware Dialog Generation with Transformer-based Video Representations

no code implementations21 Feb 2022 Yoshihiro Yamazaki, Shota Orihashi, Ryo Masumura, Mihiro Uchida, Akihiko Takashima

There have been many attempts to build multimodal dialog systems that can respond to a question about given audio-visual information, and the representative task for such systems is the Audio Visual Scene-Aware Dialog (AVSD).

Answer Generation Video Understanding

Utilizing Resource-Rich Language Datasets for End-to-End Scene Text Recognition in Resource-Poor Languages

no code implementations24 Nov 2021 Shota Orihashi, Yoshihiro Yamazaki, Naoki Makishima, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Ryo Masumura

To this end, the proposed method pre-trains the encoder by using a multilingual dataset that combines the resource-poor language's dataset and the resource-rich language's dataset to learn language-invariant knowledge for scene text recognition.

Scene Text Recognition

Hierarchical Knowledge Distillation for Dialogue Sequence Labeling

no code implementations22 Nov 2021 Shota Orihashi, Yoshihiro Yamazaki, Naoki Makishima, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Ryo Masumura

Dialogue sequence labeling is a supervised learning task that estimates labels for each utterance in the target dialogue document, and is useful for many applications such as dialogue act estimation.

Knowledge Distillation Scene Segmentation

End-to-End Rich Transcription-Style Automatic Speech Recognition with Semi-Supervised Learning

no code implementations7 Jul 2021 Tomohiro Tanaka, Ryo Masumura, Mana Ihori, Akihiko Takashima, Shota Orihashi, Naoki Makishima

We propose a semi-supervised learning method for building end-to-end rich transcription-style automatic speech recognition (RT-ASR) systems from small-scale rich transcription-style and large-scale common transcription-style datasets.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Cross-Modal Transformer-Based Neural Correction Models for Automatic Speech Recognition

no code implementations4 Jul 2021 Tomohiro Tanaka, Ryo Masumura, Mana Ihori, Akihiko Takashima, Takafumi Moriya, Takanori Ashihara, Shota Orihashi, Naoki Makishima

However, the conventional method cannot take into account the relationships between these two different modal inputs because the input contexts are separately encoded for each modal.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Zero-Shot Joint Modeling of Multiple Spoken-Text-Style Conversion Tasks using Switching Tokens

no code implementations23 Jun 2021 Mana Ihori, Naoki Makishima, Tomohiro Tanaka, Akihiko Takashima, Shota Orihashi, Ryo Masumura

To execute multiple conversion tasks simultaneously without preparing matched datasets, our key idea is to distinguish individual conversion tasks using the on-off switch.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Audio-Visual Speech Separation Using Cross-Modal Correspondence Loss

no code implementations2 Mar 2021 Naoki Makishima, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Shota Orihashi, Ryo Masumura

We present an audio-visual speech separation learning method that considers the correspondence between the separated signals and the visual signals to reflect the speech characteristics during training.

Speech Separation

Large-Context Conversational Representation Learning: Self-Supervised Learning for Conversational Documents

no code implementations16 Feb 2021 Ryo Masumura, Naoki Makishima, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Shota Orihashi

This paper presents a novel self-supervised learning method for handling conversational documents consisting of transcribed text of human-to-human conversations.

Language Modelling Representation Learning +2

MAPGN: MAsked Pointer-Generator Network for sequence-to-sequence pre-training

no code implementations15 Feb 2021 Mana Ihori, Naoki Makishima, Tomohiro Tanaka, Akihiko Takashima, Shota Orihashi, Ryo Masumura

However, these models require a large amount of paired data of spoken-style text and style normalized text, and it is difficult to prepare such a volume of data.

Machine Translation Self-Supervised Learning

Memory Attentive Fusion: External Language Model Integration for Transformer-based Sequence-to-Sequence Model

no code implementations INLG (ACL) 2020 Mana Ihori, Ryo Masumura, Naoki Makishima, Tomohiro Tanaka, Akihiko Takashima, Shota Orihashi

Thus, it is important to leverage memorized knowledge in the external LM for building the seq2seq model, since it is hard to prepare a large amount of paired data.

Language Modelling

A Transformer-based Audio Captioning Model with Keyword Estimation

no code implementations1 Jul 2020 Yuma Koizumi, Ryo Masumura, Kyosuke Nishida, Masahiro Yasuda, Shoichiro Saito

TRACKE estimates keywords, which comprise a word set corresponding to audio events/scenes in the input audio, and generates the caption while referring to the estimated keywords to reduce word-selection indeterminacy.

Acoustic Scene Classification Audio captioning +2

Parallel Corpus for Japanese Spoken-to-Written Style Conversion

no code implementations LREC 2020 Mana Ihori, Akihiko Takashima, Ryo Masumura

Therefore, we created a new Japanese parallel corpus of spoken-style text and written-style text that can simultaneously handle general problems and Japanese-specific ones.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Generating Responses that Reflect Meta Information in User-Generated Question Answer Pairs

no code implementations LREC 2020 Takashi Kodama, Ryuichiro Higashinaka, Koh Mitsuda, Ryo Masumura, Yushi Aono, Ryuta Nakamura, Noritake Adachi, Hidetoshi Kawabata

This paper concerns the problem of realizing consistent personalities in neural conversational modeling by using user generated question-answer pairs as training data.

Question Answering

Multi-task and Multi-lingual Joint Learning of Neural Lexical Utterance Classification based on Partially-shared Modeling

no code implementations COLING 2018 Ryo Masumura, Tomohiro Tanaka, Ryuichiro Higashinaka, Hirokazu Masataki, Yushi Aono

In addition, in order to effectively transfer knowledge between different task data sets and different language data sets, this paper proposes a partially-shared modeling method that possesses both shared components and components specific to individual data sets.

Classification Feature Engineering +3

Neural Dialogue Context Online End-of-Turn Detection

no code implementations WS 2018 Ryo Masumura, Tomohiro Tanaka, Atsushi Ando, Ryo Ishii, Ryuichiro Higashinaka, Yushi Aono

This paper proposes a fully neural network based dialogue-context online end-of-turn detection method that can utilize long-range interactive information extracted from both speaker{'}s utterances and collocutor{'}s utterances.

Action Detection Spoken Dialogue Systems

Cannot find the paper you are looking for? You can Submit a new open access paper.