Search Results for author: Takaaki Hori

Found 39 papers, 6 papers with code

End-to-End Speech Recognition: A Survey

no code implementations • 3 Mar 2023 • Rohit Prabhavalkar, Takaaki Hori, Tara N. Sainath, Ralf Schlüter, Shinji Watanabe

In the last decade of automatic speech recognition (ASR) research, the introduction of deep learning brought considerable reductions in word error rate of more than 50% relative, compared to modeling without deep learning.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Variable Attention Masking for Configurable Transformer Transducer Speech Recognition

no code implementations • 2 Nov 2022 • Pawel Swietojanski, Stefan Braun, Dogan Can, Thiago Fraga da Silva, Arnab Ghoshal, Takaaki Hori, Roger Hsiao, Henry Mason, Erik McDermott, Honza Silovsky, Ruchir Travadi, Xiaodan Zhuang

This work studies the use of attention masking in transformer transducer based speech recognition for building a single configurable model for different deployment scenarios.

speech-recognition Speech Recognition

Paper
Add Code

Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR

no code implementations • 1 Mar 2022 • Xuankai Chang, Niko Moritz, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux

As an example application, we use the extended GTC (GTC-e) for the multi-speaker speech recognition task.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Sequence Transduction with Graph-based Supervision

no code implementations • 1 Nov 2021 • Niko Moritz, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux

The recurrent neural network transducer (RNN-T) objective plays a major role in building today's best automatic speech recognition (ASR) systems for production.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Audio-Visual Scene-Aware Dialog and Reasoning using Audio-Visual Transformers with Joint Student-Teacher Learning

no code implementations • 13 Oct 2021 • Ankit P. Shah, Shijie Geng, Peng Gao, Anoop Cherian, Takaaki Hori, Tim K. Marks, Jonathan Le Roux, Chiori Hori

In previous work, we have proposed the Audio-Visual Scene-Aware Dialog (AVSD) task, collected an AVSD dataset, developed AVSD technologies, and hosted an AVSD challenge track at both the 7th and 8th Dialog System Technology Challenges (DSTC7, DSTC8).

Region Proposal

Paper
Add Code

Advancing Momentum Pseudo-Labeling with Conformer and Initialization Strategy

no code implementations • 11 Oct 2021 • Yosuke Higuchi, Niko Moritz, Jonathan Le Roux, Takaaki Hori

Pseudo-labeling (PL), a semi-supervised learning (SSL) method where a seed model performs self-training using pseudo-labels generated from untranscribed speech, has been shown to enhance the performance of end-to-end automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Optimizing Latency for Online Video CaptioningUsing Audio-Visual Transformers

no code implementations • 4 Aug 2021 • Chiori Hori, Takaaki Hori, Jonathan Le Roux

A CNN-based timing detector is also trained to detect a proper output timing, where the captions generated by the two Trans-formers become sufficiently close to each other.

Video Captioning

Paper
Add Code

Dual Causal/Non-Causal Self-Attention for Streaming End-to-End Speech Recognition

no code implementations • 2 Jul 2021 • Niko Moritz, Takaaki Hori, Jonathan Le Roux

Attention-based end-to-end automatic speech recognition (ASR) systems have recently demonstrated state-of-the-art results for numerous tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Momentum Pseudo-Labeling for Semi-Supervised Speech Recognition

no code implementations • 16 Jun 2021 • Yosuke Higuchi, Niko Moritz, Jonathan Le Roux, Takaaki Hori

MPL consists of a pair of online and offline models that interact and learn from each other, inspired by the mean teacher method.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Advanced Long-context End-to-end Speech Recognition Using Context-expanded Transformers

no code implementations • 19 Apr 2021 • Takaaki Hori, Niko Moritz, Chiori Hori, Jonathan Le Roux

In this paper, we extend our prior work by (1) introducing the Conformer architecture to further improve the accuracy, (2) accelerating the decoding process with a novel activation recycling technique, and (3) enabling streaming decoding with triggered attention.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Capturing Multi-Resolution Context by Dilated Self-Attention

no code implementations • 7 Apr 2021 • Niko Moritz, Takaaki Hori, Jonathan Le Roux

The restricted self-attention allows attention to neighboring frames of the query at a high resolution, and the dilation mechanism summarizes distant information to allow attending to it with a lower resolution.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Unsupervised Domain Adaptation for Speech Recognition via Uncertainty Driven Self-Training

no code implementations • 26 Nov 2020 • Sameer Khurana, Niko Moritz, Takaaki Hori, Jonathan Le Roux

The performance of automatic speech recognition (ASR) systems typically degrades significantly when the training and test data domains are mismatched.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Semi-Supervised Speech Recognition via Graph-based Temporal Classification

no code implementations • 29 Oct 2020 • Niko Moritz, Takaaki Hori, Jonathan Le Roux

However, alternative ASR hypotheses of an N-best list can provide more accurate labels for an unlabeled speech utterance and also reflect uncertainties of the seed ASR model.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Multi-Pass Transformer for Machine Translation

no code implementations • 23 Sep 2020 • Peng Gao, Chiori Hori, Shijie Geng, Takaaki Hori, Jonathan Le Roux

In contrast with previous approaches where information flows only towards deeper layers of a stack, we consider a multi-pass transformer (MPT) architecture in which earlier layers are allowed to process information in light of the output of later layers.

Machine Translation Neural Architecture Search +1

Paper
Add Code

Unsupervised Speaker Adaptation using Attention-based Speaker Memory for End-to-End ASR

no code implementations • 14 Feb 2020 • Leda Sari, Niko Moritz, Takaaki Hori, Jonathan Le Roux

We propose an unsupervised speaker adaptation method inspired by the neural Turing machine for end-to-end (E2E) automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Streaming automatic speech recognition with the transformer model

no code implementations • 8 Jan 2020 • Niko Moritz, Takaaki Hori, Jonathan Le Roux

Encoder-decoder based sequence-to-sequence models have demonstrated state-of-the-art results in end-to-end automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

A Comparative Study on Transformer vs RNN in Speech Applications

1 code implementation • 13 Sep 2019 • Shigeki Karita, Nanxin Chen, Tomoki Hayashi, Takaaki Hori, Hirofumi Inaguma, Ziyan Jiang, Masao Someki, Nelson Enrique Yalta Soplin, Ryuichi Yamamoto, Xiaofei Wang, Shinji Watanabe, Takenori Yoshimura, Wangyou Zhang

Sequence-to-sequence models have been widely used in end-to-end speech processing, for example, automatic speech recognition (ASR), speech translation (ST), and text-to-speech (TTS).

Ranked #12 on Speech Recognition on AISHELL-1

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

7,858

Paper
Code

Multi-Stream End-to-End Speech Recognition

no code implementations • 17 Jun 2019 • Ruizhi Li, Xiaofei Wang, Sri Harish Mallidi, Shinji Watanabe, Takaaki Hori, Hynek Hermansky

Two representative framework have been proposed and discussed, which are Multi-Encoder Multi-Resolution (MEM-Res) framework and Multi-Encoder Multi-Array (MEM-Array) framework, respectively.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Semi-supervised Sequence-to-sequence ASR using Unpaired Speech and Text

no code implementations • 30 Apr 2019 • Murali Karthick Baskar, Shinji Watanabe, Ramon Astudillo, Takaaki Hori, Lukáš Burget, Jan Černocký

Such techniques derive training procedures and losses able to leverage unpaired speech and/or text data by combining ASR with Text-to-Speech (TTS) models.

Ranked #33 on Semi-Supervised Image Classification on ImageNet - 10% labeled data (Top 5 Accuracy metric)

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Multi-encoder multi-resolution framework for end-to-end speech recognition

no code implementations • 12 Nov 2018 • Ruizhi Li, Xiaofei Wang, Sri Harish Mallidi, Takaaki Hori, Shinji Watanabe, Hynek Hermansky

In this work, we present a novel Multi-Encoder Multi-Resolution (MEMR) framework based on the joint CTC/Attention model.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Vectorization of hypotheses and speech for faster beam search in encoder decoder-based speech recognition

no code implementations • 12 Nov 2018 • Hiroshi Seki, Takaaki Hori, Shinji Watanabe

In this paper, we propose a parallelism technique for beam search, which accelerates the search process by vectorizing multiple hypotheses to eliminate the for-loop program.

speech-recognition Speech Recognition

Paper
Add Code

Stream attention-based multi-array end-to-end speech recognition

no code implementations • 12 Nov 2018 • Xiaofei Wang, Ruizhi Li, Sri Harish Mallid, Takaaki Hori, Shinji Watanabe, Hynek Hermansky

Automatic Speech Recognition (ASR) using multiple microphone arrays has achieved great success in the far-field robustness.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Promising Accurate Prefix Boosting for sequence-to-sequence ASR

no code implementations • 7 Nov 2018 • Murali Karthick Baskar, Lukáš Burget, Shinji Watanabe, Martin Karafiát, Takaaki Hori, Jan Honza Černocký

In this paper, we present promising accurate prefix boosting (PAPB), a discriminative training technique for attention based sequence-to-sequence (seq2seq) ASR.

Paper
Add Code

Analysis of Multilingual Sequence-to-Sequence speech recognition systems

no code implementations • 7 Nov 2018 • Martin Karafiát, Murali Karthick Baskar, Shinji Watanabe, Takaaki Hori, Matthew Wiesner, Jan "Honza'' Černocký

This paper investigates the applications of various multilingual approaches developed in conventional hidden Markov model (HMM) systems to sequence-to-sequence (seq2seq) automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

CNN-based MultiChannel End-to-End Speech Recognition for everyday home environments

no code implementations • 7 Nov 2018 • Nelson Yalta, Shinji Watanabe, Takaaki Hori, Kazuhiro Nakadai, Tetsuya OGATA

By employing a convolutional neural network (CNN)-based multichannel end-to-end speech recognition system, this study attempts to overcome the presents difficulties in everyday environments.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Cycle-consistency training for end-to-end speech recognition

no code implementations • 2 Nov 2018 • Takaaki Hori, Ramon Astudillo, Tomoki Hayashi, Yu Zhang, Shinji Watanabe, Jonathan Le Roux

To solve this problem, this work presents a loss that is based on the speech encoder state sequence instead of the raw speech signal.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Multilingual sequence-to-sequence speech recognition: architecture, transfer learning, and language modeling

no code implementations • 4 Oct 2018 • Jaejin Cho, Murali Karthick Baskar, Ruizhi Li, Matthew Wiesner, Sri Harish Mallidi, Nelson Yalta, Martin Karafiat, Shinji Watanabe, Takaaki Hori

In this work, we attempt to use data from 10 BABEL languages to build a multi-lingual seq2seq model as a prior model, and then port them towards 4 other BABEL languages using transfer learning approach.

Language Modelling Sequence-To-Sequence Speech Recognition +2

Paper
Add Code

End-to-End Multi-Lingual Multi-Speaker Speech Recognition

no code implementations • 27 Sep 2018 • Hiroshi Seki, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux, John R. Hershey

Several multi-lingual ASR systems were recently proposed based on a monolithic neural network architecture without language-dependent modules, showing that modeling of multiple languages is well within the capabilities of an end-to-end framework.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

End-to-end Speech Recognition with Word-based RNN Language Models

no code implementations • 8 Aug 2018 • Takaaki Hori, Jaejin Cho, Shinji Watanabe

This paper investigates the impact of word-based RNN language models (RNN-LMs) on the performance of end-to-end automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Back-Translation-Style Data Augmentation for End-to-End ASR

no code implementations • 28 Jul 2018 • Tomoki Hayashi, Shinji Watanabe, Yu Zhang, Tomoki Toda, Takaaki Hori, Ramon Astudillo, Kazuya Takeda

In this paper we propose a novel data augmentation method for attention-based end-to-end automatic speech recognition (E2E-ASR), utilizing a large amount of text which is not paired with speech signals.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

End-to-End Audio Visual Scene-Aware Dialog using Multimodal Attention-Based Video Features

2 code implementations • 21 Jun 2018 • Chiori Hori, Huda Alamri, Jue Wang, Gordon Wichern, Takaaki Hori, Anoop Cherian, Tim K. Marks, Vincent Cartillier, Raphael Gontijo Lopes, Abhishek Das, Irfan Essa, Dhruv Batra, Devi Parikh

We introduce a new dataset of dialogs about videos of human behaviors.

Question Answering Video Description +1

Paper
Code

A Purely End-to-end System for Multi-speaker Speech Recognition

no code implementations • ACL 2018 • Hiroshi Seki, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux, John R. Hershey

In this paper, we propose a new sequence-to-sequence framework to directly decode multiple label sequences from a single speech sequence by unifying source separation and speech recognition functions in an end-to-end manner.

speech-recognition Speech Recognition

Paper
Add Code

ESPnet: End-to-End Speech Processing Toolkit

no code implementations • 30 Mar 2018 • Shinji Watanabe, Takaaki Hori, Shigeki Karita, Tomoki Hayashi, Jiro Nishitoba, Yuya Unno, Nelson Enrique Yalta Soplin, Jahn Heymann, Matthew Wiesner, Nanxin Chen, Adithya Renduchintala, Tsubasa Ochiai

This paper introduces a new open source platform for end-to-end speech processing named ESPnet.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Joint CTC/attention decoding for end-to-end speech recognition

1 code implementation • ACL 2017 • Takaaki Hori, Shinji Watanabe, John Hershey

End-to-end automatic speech recognition (ASR) has become a popular alternative to conventional DNN/HMM systems because it avoids the need for linguistic resources such as pronunciation dictionary, tokenization, and context-dependency trees, leading to a greatly simplified model-building process.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

10,095

Paper
Code

End-to-end Conversation Modeling Track in DSTC6

1 code implementation • 22 Jun 2017 • Chiori Hori, Takaaki Hori

For example, Ghazvininejad et al. proposed a knowledge grounded neural conversation model [3], where the research is aiming at combining conversational dialogs with task-oriented knowledge using unstructured data such as Twitter data for conversation and Foursquare data for external knowledge. However, the task is still limited to a restaurant information service, and has not yet been tested with a wide variety of dialog tasks.

Paper
Code

Advances in Joint CTC-Attention based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM

6 code implementations • 8 Jun 2017 • Takaaki Hori, Shinji Watanabe, Yu Zhang, William Chan

The CTC network sits on top of the encoder and is jointly trained with the attention-based decoder.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

1,157

Paper
Code

Multichannel End-to-end Speech Recognition

no code implementations • ICML 2017 • Tsubasa Ochiai, Shinji Watanabe, Takaaki Hori, John R. Hershey

The field of speech recognition is in the midst of a paradigm shift: end-to-end neural networks are challenging the dominance of hidden Markov models as a core technology.

Language Modelling Speech Enhancement +2

Paper
Add Code

Attention-Based Multimodal Fusion for Video Description

no code implementations • ICCV 2017 • Chiori Hori, Takaaki Hori, Teng-Yok Lee, Kazuhiro Sumi, John R. Hershey, Tim K. Marks

Currently successful methods for video description are based on encoder-decoder sentence generation using recur-rent neural networks (RNNs).

Sentence Video Description

Paper
Add Code

Joint CTC-Attention based End-to-End Speech Recognition using Multi-task Learning

8 code implementations • 21 Sep 2016 • Suyoun Kim, Takaaki Hori, Shinji Watanabe

Recently, there has been an increasing interest in end-to-end speech recognition that directly transcribes speech to text without any predefined alignments.

Multi-Task Learning Speech Recognition

10,095

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.