Search Results for author: Takaaki Hori

Found 34 papers, 5 papers with code

Audio-Visual Scene-Aware Dialog and Reasoning using Audio-Visual Transformers with Joint Student-Teacher Learning

no code implementations13 Oct 2021 Ankit P. Shah, Shijie Geng, Peng Gao, Anoop Cherian, Takaaki Hori, Tim K. Marks, Jonathan Le Roux, Chiori Hori

In previous work, we have proposed the Audio-Visual Scene-Aware Dialog (AVSD) task, collected an AVSD dataset, developed AVSD technologies, and hosted an AVSD challenge track at both the 7th and 8th Dialog System Technology Challenges (DSTC7, DSTC8).

Region Proposal

Advancing Momentum Pseudo-Labeling with Conformer and Initialization Strategy

no code implementations11 Oct 2021 Yosuke Higuchi, Niko Moritz, Jonathan Le Roux, Takaaki Hori

Pseudo-labeling (PL), a semi-supervised learning (SSL) method where a seed model performs self-training using pseudo-labels generated from untranscribed speech, has been shown to enhance the performance of end-to-end automatic speech recognition (ASR).

automatic-speech-recognition Language Modelling +1

Optimizing Latency for Online Video CaptioningUsing Audio-Visual Transformers

no code implementations4 Aug 2021 Chiori Hori, Takaaki Hori, Jonathan Le Roux

A CNN-based timing detector is also trained to detect a proper output timing, where the captions generated by the two Trans-formers become sufficiently close to each other.

Video Captioning

Dual Causal/Non-Causal Self-Attention for Streaming End-to-End Speech Recognition

no code implementations2 Jul 2021 Niko Moritz, Takaaki Hori, Jonathan Le Roux

Attention-based end-to-end automatic speech recognition (ASR) systems have recently demonstrated state-of-the-art results for numerous tasks.

automatic-speech-recognition End-To-End Speech Recognition +1

Momentum Pseudo-Labeling for Semi-Supervised Speech Recognition

no code implementations16 Jun 2021 Yosuke Higuchi, Niko Moritz, Jonathan Le Roux, Takaaki Hori

MPL consists of a pair of online and offline models that interact and learn from each other, inspired by the mean teacher method.

automatic-speech-recognition End-To-End Speech Recognition +1

Advanced Long-context End-to-end Speech Recognition Using Context-expanded Transformers

no code implementations19 Apr 2021 Takaaki Hori, Niko Moritz, Chiori Hori, Jonathan Le Roux

In this paper, we extend our prior work by (1) introducing the Conformer architecture to further improve the accuracy, (2) accelerating the decoding process with a novel activation recycling technique, and (3) enabling streaming decoding with triggered attention.

automatic-speech-recognition End-To-End Speech Recognition +1

Capturing Multi-Resolution Context by Dilated Self-Attention

no code implementations7 Apr 2021 Niko Moritz, Takaaki Hori, Jonathan Le Roux

The restricted self-attention allows attention to neighboring frames of the query at a high resolution, and the dilation mechanism summarizes distant information to allow attending to it with a lower resolution.

automatic-speech-recognition Machine Translation +2

Unsupervised Domain Adaptation for Speech Recognition via Uncertainty Driven Self-Training

no code implementations26 Nov 2020 Sameer Khurana, Niko Moritz, Takaaki Hori, Jonathan Le Roux

The performance of automatic speech recognition (ASR) systems typically degrades significantly when the training and test data domains are mismatched.

automatic-speech-recognition Speech Recognition +1

Semi-Supervised Speech Recognition via Graph-based Temporal Classification

no code implementations29 Oct 2020 Niko Moritz, Takaaki Hori, Jonathan Le Roux

However, alternative ASR hypotheses of an N-best list can provide more accurate labels for an unlabeled speech utterance and also reflect uncertainties of the seed ASR model.

automatic-speech-recognition Classification +2

Multi-Pass Transformer for Machine Translation

no code implementations23 Sep 2020 Peng Gao, Chiori Hori, Shijie Geng, Takaaki Hori, Jonathan Le Roux

In contrast with previous approaches where information flows only towards deeper layers of a stack, we consider a multi-pass transformer (MPT) architecture in which earlier layers are allowed to process information in light of the output of later layers.

Machine Translation Neural Architecture Search +1

Unsupervised Speaker Adaptation using Attention-based Speaker Memory for End-to-End ASR

no code implementations14 Feb 2020 Leda Sari, Niko Moritz, Takaaki Hori, Jonathan Le Roux

We propose an unsupervised speaker adaptation method inspired by the neural Turing machine for end-to-end (E2E) automatic speech recognition (ASR).

automatic-speech-recognition End-To-End Speech Recognition +1

Streaming automatic speech recognition with the transformer model

no code implementations8 Jan 2020 Niko Moritz, Takaaki Hori, Jonathan Le Roux

Encoder-decoder based sequence-to-sequence models have demonstrated state-of-the-art results in end-to-end automatic speech recognition (ASR).

automatic-speech-recognition End-To-End Speech Recognition +1

Multi-Stream End-to-End Speech Recognition

no code implementations17 Jun 2019 Ruizhi Li, Xiaofei Wang, Sri Harish Mallidi, Shinji Watanabe, Takaaki Hori, Hynek Hermansky

Two representative framework have been proposed and discussed, which are Multi-Encoder Multi-Resolution (MEM-Res) framework and Multi-Encoder Multi-Array (MEM-Array) framework, respectively.

automatic-speech-recognition End-To-End Speech Recognition +1

Vectorization of hypotheses and speech for faster beam search in encoder decoder-based speech recognition

no code implementations12 Nov 2018 Hiroshi Seki, Takaaki Hori, Shinji Watanabe

In this paper, we propose a parallelism technique for beam search, which accelerates the search process by vectorizing multiple hypotheses to eliminate the for-loop program.

Speech Recognition

Promising Accurate Prefix Boosting for sequence-to-sequence ASR

no code implementations7 Nov 2018 Murali Karthick Baskar, Lukáš Burget, Shinji Watanabe, Martin Karafiát, Takaaki Hori, Jan Honza Černocký

In this paper, we present promising accurate prefix boosting (PAPB), a discriminative training technique for attention based sequence-to-sequence (seq2seq) ASR.

Analysis of Multilingual Sequence-to-Sequence speech recognition systems

no code implementations7 Nov 2018 Martin Karafiát, Murali Karthick Baskar, Shinji Watanabe, Takaaki Hori, Matthew Wiesner, Jan "Honza'' Černocký

This paper investigates the applications of various multilingual approaches developed in conventional hidden Markov model (HMM) systems to sequence-to-sequence (seq2seq) automatic speech recognition (ASR).

automatic-speech-recognition Sequence-To-Sequence Speech Recognition +1

CNN-based MultiChannel End-to-End Speech Recognition for everyday home environments

no code implementations7 Nov 2018 Nelson Yalta, Shinji Watanabe, Takaaki Hori, Kazuhiro Nakadai, Tetsuya OGATA

By employing a convolutional neural network (CNN)-based multichannel end-to-end speech recognition system, this study attempts to overcome the presents difficulties in everyday environments.

automatic-speech-recognition End-To-End Speech Recognition +1

Cycle-consistency training for end-to-end speech recognition

no code implementations2 Nov 2018 Takaaki Hori, Ramon Astudillo, Tomoki Hayashi, Yu Zhang, Shinji Watanabe, Jonathan Le Roux

To solve this problem, this work presents a loss that is based on the speech encoder state sequence instead of the raw speech signal.

automatic-speech-recognition End-To-End Speech Recognition +2

Multilingual sequence-to-sequence speech recognition: architecture, transfer learning, and language modeling

no code implementations4 Oct 2018 Jaejin Cho, Murali Karthick Baskar, Ruizhi Li, Matthew Wiesner, Sri Harish Mallidi, Nelson Yalta, Martin Karafiat, Shinji Watanabe, Takaaki Hori

In this work, we attempt to use data from 10 BABEL languages to build a multi-lingual seq2seq model as a prior model, and then port them towards 4 other BABEL languages using transfer learning approach.

Language Modelling Sequence-To-Sequence Speech Recognition +1

End-to-end Speech Recognition with Word-based RNN Language Models

no code implementations8 Aug 2018 Takaaki Hori, Jaejin Cho, Shinji Watanabe

This paper investigates the impact of word-based RNN language models (RNN-LMs) on the performance of end-to-end automatic speech recognition (ASR).

automatic-speech-recognition End-To-End Speech Recognition +1

Back-Translation-Style Data Augmentation for End-to-End ASR

no code implementations28 Jul 2018 Tomoki Hayashi, Shinji Watanabe, Yu Zhang, Tomoki Toda, Takaaki Hori, Ramon Astudillo, Kazuya Takeda

In this paper we propose a novel data augmentation method for attention-based end-to-end automatic speech recognition (E2E-ASR), utilizing a large amount of text which is not paired with speech signals.

automatic-speech-recognition Data Augmentation +4

A Purely End-to-end System for Multi-speaker Speech Recognition

no code implementations ACL 2018 Hiroshi Seki, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux, John R. Hershey

In this paper, we propose a new sequence-to-sequence framework to directly decode multiple label sequences from a single speech sequence by unifying source separation and speech recognition functions in an end-to-end manner.

Speech Recognition

Joint CTC/attention decoding for end-to-end speech recognition

no code implementations ACL 2017 Takaaki Hori, Shinji Watanabe, John Hershey

End-to-end automatic speech recognition (ASR) has become a popular alternative to conventional DNN/HMM systems because it avoids the need for linguistic resources such as pronunciation dictionary, tokenization, and context-dependency trees, leading to a greatly simplified model-building process.

automatic-speech-recognition End-To-End Speech Recognition +3

End-to-end Conversation Modeling Track in DSTC6

1 code implementation22 Jun 2017 Chiori Hori, Takaaki Hori

For example, Ghazvininejad et al. proposed a knowledge grounded neural conversation model [3], where the research is aiming at combining conversational dialogs with task-oriented knowledge using unstructured data such as Twitter data for conversation and Foursquare data for external knowledge. However, the task is still limited to a restaurant information service, and has not yet been tested with a wide variety of dialog tasks.

Multichannel End-to-end Speech Recognition

no code implementations ICML 2017 Tsubasa Ochiai, Shinji Watanabe, Takaaki Hori, John R. Hershey

The field of speech recognition is in the midst of a paradigm shift: end-to-end neural networks are challenging the dominance of hidden Markov models as a core technology.

End-To-End Speech Recognition Language Modelling +2

Attention-Based Multimodal Fusion for Video Description

no code implementations ICCV 2017 Chiori Hori, Takaaki Hori, Teng-Yok Lee, Kazuhiro Sumi, John R. Hershey, Tim K. Marks

Currently successful methods for video description are based on encoder-decoder sentence generation using recur-rent neural networks (RNNs).

Video Description

Joint CTC-Attention based End-to-End Speech Recognition using Multi-task Learning

6 code implementations21 Sep 2016 Suyoun Kim, Takaaki Hori, Shinji Watanabe

Recently, there has been an increasing interest in end-to-end speech recognition that directly transcribes speech to text without any predefined alignments.

End-To-End Speech Recognition Multi-Task Learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.