Search Results for author: Hirofumi Inaguma

Found 36 papers, 12 papers with code

The JHU/KyotoU Speech Translation System for IWSLT 2018

no code implementations IWSLT (EMNLP) 2018 Hirofumi Inaguma, Xuan Zhang, Zhiqi Wang, Adithya Renduchintala, Shinji Watanabe, Kevin Duh

This paper describes the Johns Hopkins University (JHU) and Kyoto University submissions to the Speech Translation evaluation campaign at IWSLT2018.

Transfer Learning Translation

Efficient Monotonic Multihead Attention

no code implementations7 Dec 2023 Xutai Ma, Anna Sun, Siqi Ouyang, Hirofumi Inaguma, Paden Tomasello

We introduce the Efficient Monotonic Multihead Attention (EMMA), a state-of-the-art simultaneous translation model with numerically-stable and unbiased monotonic alignment estimation.

Simultaneous Speech-to-Text Translation Translation

Hybrid Transducer and Attention based Encoder-Decoder Modeling for Speech-to-Text Tasks

no code implementations4 May 2023 Yun Tang, Anna Y. Sun, Hirofumi Inaguma, Xinyue Chen, Ning Dong, Xutai Ma, Paden D. Tomasello, Juan Pino

In order to leverage strengths of both modeling methods, we propose a solution by combining Transducer and Attention based Encoder-Decoder (TAED) for speech-to-text tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Enhancing Speech-to-Speech Translation with Multiple TTS Targets

no code implementations10 Apr 2023 Jiatong Shi, Yun Tang, Ann Lee, Hirofumi Inaguma, Changhan Wang, Juan Pino, Shinji Watanabe

It has been known that direct speech-to-speech translation (S2ST) models usually suffer from the data scarcity issue because of the limited existing parallel materials for both source and target speech.

Speech-to-Speech Translation Speech-to-Text Translation +1

UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units

1 code implementation15 Dec 2022 Hirofumi Inaguma, Sravya Popuri, Ilia Kulikov, Peng-Jen Chen, Changhan Wang, Yu-An Chung, Yun Tang, Ann Lee, Shinji Watanabe, Juan Pino

We enhance the model performance by subword prediction in the first-pass decoder, advanced two-pass decoder architecture design and search strategy, and better training regularization.

Denoising Speech-to-Speech Translation +3

Simple and Effective Unsupervised Speech Translation

no code implementations18 Oct 2022 Changhan Wang, Hirofumi Inaguma, Peng-Jen Chen, Ilia Kulikov, Yun Tang, Wei-Ning Hsu, Michael Auli, Juan Pino

The amount of labeled data to train models for speech tasks is limited for most languages, however, the data scarcity is exacerbated for speech translation which requires labeled data covering two different languages.

Machine Translation speech-recognition +6

Non-autoregressive Error Correction for CTC-based ASR with Phone-conditioned Masked LM

1 code implementation8 Sep 2022 Hayato Futami, Hirofumi Inaguma, Sei Ueno, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara

Connectionist temporal classification (CTC) -based models are attractive in automatic speech recognition (ASR) because of their non-autoregressive nature.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Distilling the Knowledge of BERT for CTC-based ASR

no code implementations5 Sep 2022 Hayato Futami, Hirofumi Inaguma, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara

In this study, we propose to distill the knowledge of BERT for CTC-based ASR, extending our previous study for attention-based ASR.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation

no code implementations11 Oct 2021 Yosuke Higuchi, Nanxin Chen, Yuya Fujita, Hirofumi Inaguma, Tatsuya Komatsu, Jaesong Lee, Jumon Nozaki, Tianzi Wang, Shinji Watanabe

Non-autoregressive (NAR) models simultaneously generate multiple outputs in a sequence, which significantly reduces the inference speed at the cost of accuracy drop compared to autoregressive baselines.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

ASR Rescoring and Confidence Estimation with ELECTRA

no code implementations5 Oct 2021 Hayato Futami, Hirofumi Inaguma, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara

We propose an ASR rescoring method for directly detecting errors with ELECTRA, which is originally a pre-training method for NLP tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates

1 code implementation27 Sep 2021 Hirofumi Inaguma, Siddharth Dalmia, Brian Yan, Shinji Watanabe

We propose Fast-MD, a fast MD model that generates HI by non-autoregressive (NAR) decoding based on connectionist temporal classification (CTC) outputs followed by an ASR decoder.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Non-autoregressive End-to-end Speech Translation with Parallel Autoregressive Rescoring

no code implementations9 Sep 2021 Hirofumi Inaguma, Yosuke Higuchi, Kevin Duh, Tatsuya Kawahara, Shinji Watanabe

We propose a unified NAR E2E-ST framework called Orthros, which has an NAR decoder and an auxiliary shallow AR decoder on top of the shared encoder.

Language Modelling Translation

VAD-free Streaming Hybrid CTC/Attention ASR for Unsegmented Recording

no code implementations15 Jul 2021 Hirofumi Inaguma, Tatsuya Kawahara

In this work, we propose novel decoding algorithms to enable streaming automatic speech recognition (ASR) on unsegmented long-form recordings without voice activity detection (VAD), based on monotonic chunkwise attention (MoChA) with an auxiliary connectionist temporal classification (CTC) objective.

Action Detection Activity Detection +3

Alignment Knowledge Distillation for Online Streaming Attention-based Speech Recognition

no code implementations28 Feb 2021 Hirofumi Inaguma, Tatsuya Kawahara

We compare CTC-ST with several methods that distill alignment knowledge from a hybrid ASR system and show that the CTC-ST can achieve a comparable tradeoff of accuracy and latency without relying on external alignment information.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Improved Mask-CTC for Non-Autoregressive End-to-End ASR

no code implementations26 Oct 2020 Yosuke Higuchi, Hirofumi Inaguma, Shinji Watanabe, Tetsuji Ogawa, Tetsunori Kobayashi

While Mask-CTC achieves remarkably fast inference speed, its recognition performance falls behind that of conventional autoregressive (AR) systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Orthros: Non-autoregressive End-to-end Speech Translation with Dual-decoder

no code implementations25 Oct 2020 Hirofumi Inaguma, Yosuke Higuchi, Kevin Duh, Tatsuya Kawahara, Shinji Watanabe

Fast inference speed is an important goal towards real-world deployment of speech translation (ST) systems.

Translation

Distilling the Knowledge of BERT for Sequence-to-Sequence ASR

1 code implementation9 Aug 2020 Hayato Futami, Hirofumi Inaguma, Sei Ueno, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara

Experimental evaluations show that our method significantly improves the ASR performance from the seq2seq baseline on the Corpus of Spontaneous Japanese (CSJ).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Enhancing Monotonic Multihead Attention for Streaming ASR

1 code implementation19 May 2020 Hirofumi Inaguma, Masato Mimura, Tatsuya Kawahara

For streaming inference, all monotonic attention (MA) heads should learn proper alignments because the next token is not generated until all heads detect the corresponding token boundaries.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

CTC-synchronous Training for Monotonic Attention Model

1 code implementation10 May 2020 Hirofumi Inaguma, Masato Mimura, Tatsuya Kawahara

Monotonic chunkwise attention (MoChA) has been studied for the online streaming automatic speech recognition (ASR) based on a sequence-to-sequence framework.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

End-to-end speech-to-dialog-act recognition

no code implementations23 Apr 2020 Viet-Trung Dang, Tianyu Zhao, Sei Ueno, Hirofumi Inaguma, Tatsuya Kawahara

In the proposed model, the dialog act recognition network is conjunct with an acoustic-to-word ASR model at its latent layer before the softmax layer, which provides a distributed representation of word-level ASR decoding information.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Multilingual End-to-End Speech Translation

1 code implementation1 Oct 2019 Hirofumi Inaguma, Kevin Duh, Tatsuya Kawahara, Shinji Watanabe

In this paper, we propose a simple yet effective framework for multilingual end-to-end speech translation (ST), in which speech utterances in source languages are directly translated to the desired target languages with a universal sequence-to-sequence architecture.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Improving OOV Detection and Resolution with External Language Models in Acoustic-to-Word ASR

no code implementations22 Sep 2019 Hirofumi Inaguma, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara

Moreover, the A2C model can be used to recover out-of-vocabulary (OOV) words that are not covered by the A2W model, but this requires accurate detection of OOV words.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Transfer learning of language-independent end-to-end ASR with language model fusion

no code implementations6 Nov 2018 Hirofumi Inaguma, Jaejin Cho, Murali Karthick Baskar, Tatsuya Kawahara, Shinji Watanabe

This work explores better adaptation methods to low-resource languages using an external language model (LM) under the framework of transfer learning.

Language Modelling Transfer Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.