Search Results for author: Yuya Fujita

Found 7 papers, 1 papers with code

A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation

no code implementations11 Oct 2021 Yosuke Higuchi, Nanxin Chen, Yuya Fujita, Hirofumi Inaguma, Tatsuya Komatsu, Jaesong Lee, Jumon Nozaki, Tianzi Wang, Shinji Watanabe

Non-autoregressive (NAR) models simultaneously generate multiple outputs in a sequence, which significantly reduces the inference speed at the cost of accuracy drop compared to autoregressive baselines.

automatic-speech-recognition Speech Recognition +2

End-to-end ASR to jointly predict transcriptions and linguistic annotations

no code implementations NAACL 2021 Motoi Omachi, Yuya Fujita, Shinji Watanabe, Matthew Wiesner

We propose a Transformer-based sequence-to-sequence model for automatic speech recognition (ASR) capable of simultaneously transcribing and annotating audio with linguistic information such as phonemic transcripts or part-of-speech (POS) tags.

automatic-speech-recognition End-To-End Speech Recognition +4

Toward Streaming ASR with Non-Autoregressive Insertion-based Model

no code implementations18 Dec 2020 Yuya Fujita, Tianzi Wang, Shinji Watanabe, Motoi Omachi

We propose a system to concatenate audio segmentation and non-autoregressive ASR to realize high accuracy and low RTF ASR.

automatic-speech-recognition Speech Recognition

Insertion-Based Modeling for End-to-End Automatic Speech Recognition

no code implementations27 May 2020 Yuya Fujita, Shinji Watanabe, Motoi Omachi, Xuankai Chan

One NAT model, mask-predict, has been applied to ASR but the model needs some heuristics or additional component to estimate the length of the output token sequence.

Audio and Speech Processing Sound

An Investigation of End-to-End Multichannel Speech Recognition for Reverberant and Mismatch Conditions

no code implementations19 Apr 2019 Aswin Shanmugam Subramanian, Xiaofei Wang, Shinji Watanabe, Toru Taniguchi, Dung Tran, Yuya Fujita

This report investigates the ability of E2E ASR from standard close-talk to far-field applications by encompassing entire multichannel speech enhancement and ASR components within the S2S model.

automatic-speech-recognition Denoising +2

Speaker Selective Beamformer with Keyword Mask Estimation

no code implementations25 Oct 2018 Yusuke Kida, Dung Tran, Motoi Omachi, Toru Taniguchi, Yuya Fujita

The proposed method firstly utilizes a DNN-based mask estimator to separate the mixture signal into the keyword signal uttered by the target speaker and the remaining background speech.

automatic-speech-recognition Speech Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.