Search Results for author: Tomohiro Tanaka

Found 28 papers, 1 papers with code

Multi-Perspective Document Revision

no code implementations • COLING 2022 • Mana Ihori, Hiroshi Sato, Tomohiro Tanaka, Ryo Masumura

To model the task, we design a novel Japanese multi-perspective document revision dataset that simultaneously handles seven perspectives to improve the readability and clarity of a document.

Grammatical Error Correction Relation Classification +1

Paper
Add Code

Exploring Limits of Diffusion-Synthetic Training with Weakly Supervised Semantic Segmentation

no code implementations • 4 Sep 2023 • Ryota Yoshihashi, Yuya Otsuka, Kenji Doi, Tomohiro Tanaka, Hirokatsu Kataoka

The advance of generative models for images has inspired various training techniques for image recognition utilizing synthetic images.

Data Augmentation Image Generation +5

Paper
Add Code

SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge?

1 code implementation • 14 Jun 2023 • Takanori Ashihara, Takafumi Moriya, Kohei Matsuura, Tomohiro Tanaka, Yusuke Ijima, Taichi Asami, Marc Delcroix, Yukinori Honma

Self-supervised learning (SSL) for speech representation has been successfully applied in various downstream tasks, such as speech and speaker recognition.

Natural Language Understanding Self-Supervised Learning +2

Paper
Code

Transfer Learning from Pre-trained Language Models Improves End-to-End Speech Summarization

no code implementations • 7 Jun 2023 • Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, Tomohiro Tanaka, Takatomo Kano, Atsunori Ogawa, Marc Delcroix

End-to-end speech summarization (E2E SSum) directly summarizes input speech into easy-to-read short sentences with a single model.

Automatic Speech Recognition Language Modelling +3

Paper
Add Code

End-to-End Joint Target and Non-Target Speakers ASR

no code implementations • 4 Jun 2023 • Ryo Masumura, Naoki Makishima, Taiga Yamane, Yoshihiko Yamazaki, Saki Mizuno, Mana Ihori, Mihiro Uchida, Keita Suzuki, Hiroshi Sato, Tomohiro Tanaka, Akihiko Takashima, Satoshi Suzuki, Takafumi Moriya, Nobukatsu Hojo, Atsushi Ando

Target-speaker ASR systems are a promising way to only transcribe a target speaker's speech by enrolling the target speaker's information.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Improving Scheduled Sampling for Neural Transducer-based ASR

no code implementations • 25 May 2023 • Takafumi Moriya, Takanori Ashihara, Hiroshi Sato, Kohei Matsuura, Tomohiro Tanaka, Ryo Masumura

Experiments in three datasets confirm that RNNT trained with our SS approach achieves the best ASR performance.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data

no code implementations • 25 May 2023 • Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Takanori Ashihara, Kohei Matsuura, Tomohiro Tanaka, Ryo Masumura, Atsunori Ogawa, Taichi Asami

Neural transducer (RNNT)-based target-speaker speech recognition (TS-RNNT) directly transcribes a target speaker's voice from a multi-talker mixture.

Knowledge Distillation Speech Extraction +2

Paper
Add Code

Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss

no code implementations • 24 May 2023 • Hiroshi Sato, Ryo Masumura, Tsubasa Ochiai, Marc Delcroix, Takafumi Moriya, Takanori Ashihara, Kentaro Shinayama, Saki Mizuno, Mana Ihori, Tomohiro Tanaka, Nobukatsu Hojo

In this work, we propose a new SE training criterion that minimizes the distance between clean and enhanced signals in the feature representation of the SSL model to alleviate the mismatch.

Self-Supervised Learning Speech Enhancement

Paper
Add Code

Exploration of Language Dependency for Japanese Self-Supervised Speech Representation Models

no code implementations • 9 May 2023 • Takanori Ashihara, Takafumi Moriya, Kohei Matsuura, Tomohiro Tanaka

However, since the two settings have been studied individually in general, there has been little research focusing on how effective a cross-lingual model is in comparison with a monolingual model.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Leveraging Large Text Corpora for End-to-End Speech Summarization

no code implementations • 2 Mar 2023 • Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, Tomohiro Tanaka, Atsunori Ogawa, Marc Delcroix, Ryo Masumura

The first technique is to utilize a text-to-speech (TTS) system to generate synthesized speech, which is used for E2E SSum training with the text summary.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Ladder Siamese Network: a Method and Insights for Multi-level Self-Supervised Learning

no code implementations • 25 Nov 2022 • Ryota Yoshihashi, Shuhei Nishimura, Dai Yonebayashi, Yuya Otsuka, Tomohiro Tanaka, Takashi Miyazaki

Siamese-network-based self-supervised learning (SSL) suffers from slow convergence and instability in training.

Self-Supervised Learning

Paper
Add Code

Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic Knowledge Distillation of Self-Supervised Speech Models

no code implementations • 14 Jul 2022 • Takanori Ashihara, Takafumi Moriya, Kohei Matsuura, Tomohiro Tanaka

We investigate the performance on SUPERB while varying the structure and KD methods so as to keep the number of parameters constant; this allows us to analyze the contribution of the representation introduced by varying the model architecture.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations

no code implementations • 16 Jun 2022 • Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Takafumi Moriya, Naoki Makishima, Mana Ihori, Tomohiro Tanaka, Ryo Masumura

Experimental validation reveals the effectiveness of both worst-enrollment target training and SI-loss training to improve robustness against enrollment variations, by increasing speaker discriminability.

Speaker Identification Speech Extraction

Paper
Add Code

Utilizing Resource-Rich Language Datasets for End-to-End Scene Text Recognition in Resource-Poor Languages

no code implementations • 24 Nov 2021 • Shota Orihashi, Yoshihiro Yamazaki, Naoki Makishima, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Ryo Masumura

To this end, the proposed method pre-trains the encoder by using a multilingual dataset that combines the resource-poor language's dataset and the resource-rich language's dataset to learn language-invariant knowledge for scene text recognition.

Scene Text Recognition

Paper
Add Code

Hierarchical Knowledge Distillation for Dialogue Sequence Labeling

no code implementations • 22 Nov 2021 • Shota Orihashi, Yoshihiro Yamazaki, Naoki Makishima, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Ryo Masumura

Dialogue sequence labeling is a supervised learning task that estimates labels for each utterance in the target dialogue document, and is useful for many applications such as dialogue act estimation.

Knowledge Distillation Scene Segmentation

Paper
Add Code

End-to-End Rich Transcription-Style Automatic Speech Recognition with Semi-Supervised Learning

no code implementations • 7 Jul 2021 • Tomohiro Tanaka, Ryo Masumura, Mana Ihori, Akihiko Takashima, Shota Orihashi, Naoki Makishima

We propose a semi-supervised learning method for building end-to-end rich transcription-style automatic speech recognition (RT-ASR) systems from small-scale rich transcription-style and large-scale common transcription-style datasets.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Unified Autoregressive Modeling for Joint End-to-End Multi-Talker Overlapped Speech Recognition and Speaker Attribute Estimation

no code implementations • 4 Jul 2021 • Ryo Masumura, Daiki Okamura, Naoki Makishima, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Shota Orihashi

To address this problem, we propose unified autoregressive modeling for joint end-to-end multi-talker overlapped ASR and speaker attribute estimation.

Age Estimation Attribute +3

Paper
Add Code

Cross-Modal Transformer-Based Neural Correction Models for Automatic Speech Recognition

no code implementations • 4 Jul 2021 • Tomohiro Tanaka, Ryo Masumura, Mana Ihori, Akihiko Takashima, Takafumi Moriya, Takanori Ashihara, Shota Orihashi, Naoki Makishima

However, the conventional method cannot take into account the relationships between these two different modal inputs because the input contexts are separately encoded for each modal.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Zero-Shot Joint Modeling of Multiple Spoken-Text-Style Conversion Tasks using Switching Tokens

no code implementations • 23 Jun 2021 • Mana Ihori, Naoki Makishima, Tomohiro Tanaka, Akihiko Takashima, Shota Orihashi, Ryo Masumura

To execute multiple conversion tasks simultaneously without preparing matched datasets, our key idea is to distinguish individual conversion tasks using the on-off switch.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Context-Free TextSpotter for Real-Time and Mobile End-to-End Text Detection and Recognition

no code implementations • 10 Jun 2021 • Ryota Yoshihashi, Tomohiro Tanaka, Kenji Doi, Takumi Fujino, Naoaki Yamashita

In the deployment of scene-text spotting systems on mobile platforms, lightweight models with low computation are preferable.

Optical Character Recognition (OCR) Text Detection +1

Paper
Add Code

Audio-Visual Speech Separation Using Cross-Modal Correspondence Loss

no code implementations • 2 Mar 2021 • Naoki Makishima, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Shota Orihashi, Ryo Masumura

We present an audio-visual speech separation learning method that considers the correspondence between the separated signals and the visual signals to reflect the speech characteristics during training.

Speech Separation

Paper
Add Code

Hierarchical Transformer-based Large-Context End-to-end ASR with Large-Context Knowledge Distillation

no code implementations • 16 Feb 2021 • Ryo Masumura, Naoki Makishima, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Shota Orihashi

We evaluate the effectiveness of the proposed model and proposed training method on Japanese discourse ASR tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

End-to-End Automatic Speech Recognition with Deep Mutual Learning

no code implementations • 16 Feb 2021 • Ryo Masumura, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Takanori Ashihara

We also show that combining DML with the existing training techniques effectively improves ASR performance.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Large-Context Conversational Representation Learning: Self-Supervised Learning for Conversational Documents

no code implementations • 16 Feb 2021 • Ryo Masumura, Naoki Makishima, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Shota Orihashi

This paper presents a novel self-supervised learning method for handling conversational documents consisting of transcribed text of human-to-human conversations.

Language Modelling Representation Learning +2

Paper
Add Code

MAPGN: MAsked Pointer-Generator Network for sequence-to-sequence pre-training

no code implementations • 15 Feb 2021 • Mana Ihori, Naoki Makishima, Tomohiro Tanaka, Akihiko Takashima, Shota Orihashi, Ryo Masumura

However, these models require a large amount of paired data of spoken-style text and style normalized text, and it is difficult to prepare such a volume of data.

Machine Translation Self-Supervised Learning

Paper
Add Code

Memory Attentive Fusion: External Language Model Integration for Transformer-based Sequence-to-Sequence Model

no code implementations • INLG (ACL) 2020 • Mana Ihori, Ryo Masumura, Naoki Makishima, Tomohiro Tanaka, Akihiko Takashima, Shota Orihashi

Thus, it is important to leverage memorized knowledge in the external LM for building the seq2seq model, since it is hard to prepare a large amount of paired data.

Language Modelling

Paper
Add Code

Multi-task and Multi-lingual Joint Learning of Neural Lexical Utterance Classification based on Partially-shared Modeling

no code implementations • COLING 2018 • Ryo Masumura, Tomohiro Tanaka, Ryuichiro Higashinaka, Hirokazu Masataki, Yushi Aono

In addition, in order to effectively transfer knowledge between different task data sets and different language data sets, this paper proposes a partially-shared modeling method that possesses both shared components and components specific to individual data sets.

Classification Feature Engineering +3

Paper
Add Code

Neural Dialogue Context Online End-of-Turn Detection

no code implementations • WS 2018 • Ryo Masumura, Tomohiro Tanaka, Atsushi Ando, Ryo Ishii, Ryuichiro Higashinaka, Yushi Aono

This paper proposes a fully neural network based dialogue-context online end-of-turn detection method that can utilize long-range interactive information extracted from both speaker{'}s utterances and collocutor{'}s utterances.

Action Detection Spoken Dialogue Systems

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.