no code implementations • 3 Mar 2023 • Rohit Prabhavalkar, Takaaki Hori, Tara N. Sainath, Ralf Schlüter, Shinji Watanabe
In the last decade of automatic speech recognition (ASR) research, the introduction of deep learning brought considerable reductions in word error rate of more than 50% relative, compared to modeling without deep learning.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 2 Nov 2022 • Pawel Swietojanski, Stefan Braun, Dogan Can, Thiago Fraga da Silva, Arnab Ghoshal, Takaaki Hori, Roger Hsiao, Henry Mason, Erik McDermott, Honza Silovsky, Ruchir Travadi, Xiaodan Zhuang
This work studies the use of attention masking in transformer transducer based speech recognition for building a single configurable model for different deployment scenarios.
no code implementations • 1 Mar 2022 • Xuankai Chang, Niko Moritz, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux
As an example application, we use the extended GTC (GTC-e) for the multi-speaker speech recognition task.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 1 Nov 2021 • Niko Moritz, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux
The recurrent neural network transducer (RNN-T) objective plays a major role in building today's best automatic speech recognition (ASR) systems for production.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 13 Oct 2021 • Ankit P. Shah, Shijie Geng, Peng Gao, Anoop Cherian, Takaaki Hori, Tim K. Marks, Jonathan Le Roux, Chiori Hori
In previous work, we have proposed the Audio-Visual Scene-Aware Dialog (AVSD) task, collected an AVSD dataset, developed AVSD technologies, and hosted an AVSD challenge track at both the 7th and 8th Dialog System Technology Challenges (DSTC7, DSTC8).
no code implementations • 11 Oct 2021 • Yosuke Higuchi, Niko Moritz, Jonathan Le Roux, Takaaki Hori
Pseudo-labeling (PL), a semi-supervised learning (SSL) method where a seed model performs self-training using pseudo-labels generated from untranscribed speech, has been shown to enhance the performance of end-to-end automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 4 Aug 2021 • Chiori Hori, Takaaki Hori, Jonathan Le Roux
A CNN-based timing detector is also trained to detect a proper output timing, where the captions generated by the two Trans-formers become sufficiently close to each other.
no code implementations • 2 Jul 2021 • Niko Moritz, Takaaki Hori, Jonathan Le Roux
Attention-based end-to-end automatic speech recognition (ASR) systems have recently demonstrated state-of-the-art results for numerous tasks.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 16 Jun 2021 • Yosuke Higuchi, Niko Moritz, Jonathan Le Roux, Takaaki Hori
MPL consists of a pair of online and offline models that interact and learn from each other, inspired by the mean teacher method.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 19 Apr 2021 • Takaaki Hori, Niko Moritz, Chiori Hori, Jonathan Le Roux
In this paper, we extend our prior work by (1) introducing the Conformer architecture to further improve the accuracy, (2) accelerating the decoding process with a novel activation recycling technique, and (3) enabling streaming decoding with triggered attention.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 7 Apr 2021 • Niko Moritz, Takaaki Hori, Jonathan Le Roux
The restricted self-attention allows attention to neighboring frames of the query at a high resolution, and the dilation mechanism summarizes distant information to allow attending to it with a lower resolution.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 26 Nov 2020 • Sameer Khurana, Niko Moritz, Takaaki Hori, Jonathan Le Roux
The performance of automatic speech recognition (ASR) systems typically degrades significantly when the training and test data domains are mismatched.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 29 Oct 2020 • Niko Moritz, Takaaki Hori, Jonathan Le Roux
However, alternative ASR hypotheses of an N-best list can provide more accurate labels for an unlabeled speech utterance and also reflect uncertainties of the seed ASR model.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 23 Sep 2020 • Peng Gao, Chiori Hori, Shijie Geng, Takaaki Hori, Jonathan Le Roux
In contrast with previous approaches where information flows only towards deeper layers of a stack, we consider a multi-pass transformer (MPT) architecture in which earlier layers are allowed to process information in light of the output of later layers.
no code implementations • 14 Feb 2020 • Leda Sari, Niko Moritz, Takaaki Hori, Jonathan Le Roux
We propose an unsupervised speaker adaptation method inspired by the neural Turing machine for end-to-end (E2E) automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 8 Jan 2020 • Niko Moritz, Takaaki Hori, Jonathan Le Roux
Encoder-decoder based sequence-to-sequence models have demonstrated state-of-the-art results in end-to-end automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • 13 Sep 2019 • Shigeki Karita, Nanxin Chen, Tomoki Hayashi, Takaaki Hori, Hirofumi Inaguma, Ziyan Jiang, Masao Someki, Nelson Enrique Yalta Soplin, Ryuichi Yamamoto, Xiaofei Wang, Shinji Watanabe, Takenori Yoshimura, Wangyou Zhang
Sequence-to-sequence models have been widely used in end-to-end speech processing, for example, automatic speech recognition (ASR), speech translation (ST), and text-to-speech (TTS).
Ranked #12 on Speech Recognition on AISHELL-1
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 17 Jun 2019 • Ruizhi Li, Xiaofei Wang, Sri Harish Mallidi, Shinji Watanabe, Takaaki Hori, Hynek Hermansky
Two representative framework have been proposed and discussed, which are Multi-Encoder Multi-Resolution (MEM-Res) framework and Multi-Encoder Multi-Array (MEM-Array) framework, respectively.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 30 Apr 2019 • Murali Karthick Baskar, Shinji Watanabe, Ramon Astudillo, Takaaki Hori, Lukáš Burget, Jan Černocký
Such techniques derive training procedures and losses able to leverage unpaired speech and/or text data by combining ASR with Text-to-Speech (TTS) models.
Ranked #33 on Semi-Supervised Image Classification on ImageNet - 10% labeled data (Top 5 Accuracy metric)
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 12 Nov 2018 • Ruizhi Li, Xiaofei Wang, Sri Harish Mallidi, Takaaki Hori, Shinji Watanabe, Hynek Hermansky
In this work, we present a novel Multi-Encoder Multi-Resolution (MEMR) framework based on the joint CTC/Attention model.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 12 Nov 2018 • Hiroshi Seki, Takaaki Hori, Shinji Watanabe
In this paper, we propose a parallelism technique for beam search, which accelerates the search process by vectorizing multiple hypotheses to eliminate the for-loop program.
no code implementations • 12 Nov 2018 • Xiaofei Wang, Ruizhi Li, Sri Harish Mallid, Takaaki Hori, Shinji Watanabe, Hynek Hermansky
Automatic Speech Recognition (ASR) using multiple microphone arrays has achieved great success in the far-field robustness.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 7 Nov 2018 • Murali Karthick Baskar, Lukáš Burget, Shinji Watanabe, Martin Karafiát, Takaaki Hori, Jan Honza Černocký
In this paper, we present promising accurate prefix boosting (PAPB), a discriminative training technique for attention based sequence-to-sequence (seq2seq) ASR.
no code implementations • 7 Nov 2018 • Martin Karafiát, Murali Karthick Baskar, Shinji Watanabe, Takaaki Hori, Matthew Wiesner, Jan "Honza'' Černocký
This paper investigates the applications of various multilingual approaches developed in conventional hidden Markov model (HMM) systems to sequence-to-sequence (seq2seq) automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 7 Nov 2018 • Nelson Yalta, Shinji Watanabe, Takaaki Hori, Kazuhiro Nakadai, Tetsuya OGATA
By employing a convolutional neural network (CNN)-based multichannel end-to-end speech recognition system, this study attempts to overcome the presents difficulties in everyday environments.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 2 Nov 2018 • Takaaki Hori, Ramon Astudillo, Tomoki Hayashi, Yu Zhang, Shinji Watanabe, Jonathan Le Roux
To solve this problem, this work presents a loss that is based on the speech encoder state sequence instead of the raw speech signal.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 4 Oct 2018 • Jaejin Cho, Murali Karthick Baskar, Ruizhi Li, Matthew Wiesner, Sri Harish Mallidi, Nelson Yalta, Martin Karafiat, Shinji Watanabe, Takaaki Hori
In this work, we attempt to use data from 10 BABEL languages to build a multi-lingual seq2seq model as a prior model, and then port them towards 4 other BABEL languages using transfer learning approach.
Language Modelling Sequence-To-Sequence Speech Recognition +2
no code implementations • 27 Sep 2018 • Hiroshi Seki, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux, John R. Hershey
Several multi-lingual ASR systems were recently proposed based on a monolithic neural network architecture without language-dependent modules, showing that modeling of multiple languages is well within the capabilities of an end-to-end framework.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 8 Aug 2018 • Takaaki Hori, Jaejin Cho, Shinji Watanabe
This paper investigates the impact of word-based RNN language models (RNN-LMs) on the performance of end-to-end automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 28 Jul 2018 • Tomoki Hayashi, Shinji Watanabe, Yu Zhang, Tomoki Toda, Takaaki Hori, Ramon Astudillo, Kazuya Takeda
In this paper we propose a novel data augmentation method for attention-based end-to-end automatic speech recognition (E2E-ASR), utilizing a large amount of text which is not paired with speech signals.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
2 code implementations • 21 Jun 2018 • Chiori Hori, Huda Alamri, Jue Wang, Gordon Wichern, Takaaki Hori, Anoop Cherian, Tim K. Marks, Vincent Cartillier, Raphael Gontijo Lopes, Abhishek Das, Irfan Essa, Dhruv Batra, Devi Parikh
We introduce a new dataset of dialogs about videos of human behaviors.
no code implementations • ACL 2018 • Hiroshi Seki, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux, John R. Hershey
In this paper, we propose a new sequence-to-sequence framework to directly decode multiple label sequences from a single speech sequence by unifying source separation and speech recognition functions in an end-to-end manner.
no code implementations • 30 Mar 2018 • Shinji Watanabe, Takaaki Hori, Shigeki Karita, Tomoki Hayashi, Jiro Nishitoba, Yuya Unno, Nelson Enrique Yalta Soplin, Jahn Heymann, Matthew Wiesner, Nanxin Chen, Adithya Renduchintala, Tsubasa Ochiai
This paper introduces a new open source platform for end-to-end speech processing named ESPnet.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • ACL 2017 • Takaaki Hori, Shinji Watanabe, John Hershey
End-to-end automatic speech recognition (ASR) has become a popular alternative to conventional DNN/HMM systems because it avoids the need for linguistic resources such as pronunciation dictionary, tokenization, and context-dependency trees, leading to a greatly simplified model-building process.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 22 Jun 2017 • Chiori Hori, Takaaki Hori
For example, Ghazvininejad et al. proposed a knowledge grounded neural conversation model [3], where the research is aiming at combining conversational dialogs with task-oriented knowledge using unstructured data such as Twitter data for conversation and Foursquare data for external knowledge. However, the task is still limited to a restaurant information service, and has not yet been tested with a wide variety of dialog tasks.
6 code implementations • 8 Jun 2017 • Takaaki Hori, Shinji Watanabe, Yu Zhang, William Chan
The CTC network sits on top of the encoder and is jointly trained with the attention-based decoder.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • ICML 2017 • Tsubasa Ochiai, Shinji Watanabe, Takaaki Hori, John R. Hershey
The field of speech recognition is in the midst of a paradigm shift: end-to-end neural networks are challenging the dominance of hidden Markov models as a core technology.
no code implementations • ICCV 2017 • Chiori Hori, Takaaki Hori, Teng-Yok Lee, Kazuhiro Sumi, John R. Hershey, Tim K. Marks
Currently successful methods for video description are based on encoder-decoder sentence generation using recur-rent neural networks (RNNs).
8 code implementations • 21 Sep 2016 • Suyoun Kim, Takaaki Hori, Shinji Watanabe
Recently, there has been an increasing interest in end-to-end speech recognition that directly transcribes speech to text without any predefined alignments.