Search Results for author: Yosuke Kashiwagi

Found 11 papers, 0 papers with code

Streaming Joint Speech Recognition and Disfluency Detection

no code implementations16 Nov 2022 Hayato Futami, Emiru Tsunoo, Kentaro Shibata, Yosuke Kashiwagi, Takao Okuda, Siddhant Arora, Shinji Watanabe

In this study, we propose Transformer-based encoder-decoder models that jointly solve speech recognition and disfluency detection, which work in a streaming manner.

Language Modelling speech-recognition +1

Residual Language Model for End-to-end Speech Recognition

no code implementations15 Jun 2022 Emiru Tsunoo, Yosuke Kashiwagi, Chaitanya Narisetty, Shinji Watanabe

In this paper, we propose a simple external LM fusion method for domain adaptation, which considers the internal LM estimation in its training.

Automatic Speech Recognition Domain Adaptation +2

Joint Speech Recognition and Audio Captioning

no code implementations3 Feb 2022 Chaitanya Narisetty, Emiru Tsunoo, Xuankai Chang, Yosuke Kashiwagi, Michael Hentschel, Shinji Watanabe

A major hurdle in evaluating our proposed approach is the lack of labeled audio datasets with both speech transcriptions and audio captions.

Audio captioning Automatic Speech Recognition +2

Run-and-back stitch search: novel block synchronous decoding for streaming encoder-decoder ASR

no code implementations25 Jan 2022 Emiru Tsunoo, Chaitanya Narisetty, Michael Hentschel, Yosuke Kashiwagi, Shinji Watanabe

To this end, we propose a novel blockwise synchronous decoding algorithm with a hybrid approach that combines endpoint prediction and endpoint post-determination.

Automatic Speech Recognition speech-recognition

Improving Character Error Rate Is Not Equal to Having Clean Speech: Speech Enhancement for ASR Systems with Black-box Acoustic Models

no code implementations12 Oct 2021 Ryosuke Sawata, Yosuke Kashiwagi, Shusuke Takahashi

In order to optimize the DNN-based SE model in terms of the character error rate (CER), which is one of the metric to evaluate the ASR system and generally non-differentiable, our method uses two DNNs: one for speech processing and one for mimicking the output CERs derived through an acoustic model (AM).

Automatic Speech Recognition Speech Enhancement +1

Data Augmentation Methods for End-to-end Speech Recognition on Distant-Talk Scenarios

no code implementations7 Jun 2021 Emiru Tsunoo, Kentaro Shibata, Chaitanya Narisetty, Yosuke Kashiwagi, Shinji Watanabe

Although end-to-end automatic speech recognition (E2E ASR) has achieved great performance in tasks that have numerous paired data, it is still challenging to make E2E ASR robust against noisy and low-resource conditions.

Automatic Speech Recognition Data Augmentation +2

Gaussian Kernelized Self-Attention for Long Sequence Data and Its Application to CTC-based Speech Recognition

no code implementations18 Feb 2021 Yosuke Kashiwagi, Emiru Tsunoo, Shinji Watanabe

Self-attention (SA) based models have recently achieved significant performance improvements in hybrid and end-to-end automatic speech recognition (ASR) systems owing to their flexible context modeling capability.

Automatic Speech Recognition speech-recognition

Streaming Transformer ASR with Blockwise Synchronous Inference

no code implementations25 Jun 2020 Emiru Tsunoo, Yosuke Kashiwagi, Shinji Watanabe

In this paper, we extend block processing towards an entire streaming E2E ASR system without additional training, by introducing a blockwise synchronous decoding process inspired by a neural transducer into the Transformer decoder.

Automatic Speech Recognition Knowledge Distillation +1

Towards Online End-to-end Transformer Automatic Speech Recognition

no code implementations25 Oct 2019 Emiru Tsunoo, Yosuke Kashiwagi, Toshiyuki Kumakura, Shinji Watanabe

In this paper, we extend it towards an entire online E2E ASR system by introducing an online decoding process inspired by monotonic chunkwise attention (MoChA) into the Transformer decoder.

Automatic Speech Recognition speech-recognition

Transformer ASR with Contextual Block Processing

no code implementations16 Oct 2019 Emiru Tsunoo, Yosuke Kashiwagi, Toshiyuki Kumakura, Shinji Watanabe

In this paper, we propose a new block processing method for the Transformer encoder by introducing a context-aware inheritance mechanism.

Automatic Speech Recognition speech-recognition

End-to-end Adaptation with Backpropagation through WFST for On-device Speech Recognition System

no code implementations17 May 2019 Emiru Tsunoo, Yosuke Kashiwagi, Satoshi Asakawa, Toshiyuki Kumakura

We convert a pretrained WFST to a trainable neural network and adapt the system to target environments/vocabulary by E2E joint training with an AM.

speech-recognition Speech Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.