no code implementations • 11 Nov 2022 • Motoi Omachi, Brian Yan, Siddharth Dalmia, Yuya Fujita, Shinji Watanabe
To solve this problem, we would like to simultaneously generate automatic speech recognition (ASR) and ST predictions such that each source language word is explicitly mapped to a target language word.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • NAACL 2021 • Motoi Omachi, Yuya Fujita, Shinji Watanabe, Matthew Wiesner
We propose a Transformer-based sequence-to-sequence model for automatic speech recognition (ASR) capable of simultaneously transcribing and annotating audio with linguistic information such as phonemic transcripts or part-of-speech (POS) tags.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
no code implementations • 18 Dec 2020 • Yuya Fujita, Tianzi Wang, Shinji Watanabe, Motoi Omachi
We propose a system to concatenate audio segmentation and non-autoregressive ASR to realize high accuracy and low RTF ASR.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 27 May 2020 • Yuya Fujita, Shinji Watanabe, Motoi Omachi, Xuankai Chan
One NAT model, mask-predict, has been applied to ASR but the model needs some heuristics or additional component to estimate the length of the output token sequence.
Audio and Speech Processing Sound
no code implementations • 25 Oct 2018 • Yusuke Kida, Dung Tran, Motoi Omachi, Toru Taniguchi, Yuya Fujita
The proposed method firstly utilizes a DNN-based mask estimator to separate the mixture signal into the keyword signal uttered by the target speaker and the remaining background speech.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2