no code implementations • 28 Mar 2024 • Yuya Fujita, Shinji Watanabe, Xuankai Chang, Takashi Maekaku
In this paper, we propose a new model combining CTC and a latent variable model, which is one of the state-of-the-art models in the neural machine translation research field.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 6 Oct 2023 • Takashi Maekaku, Jiatong Shi, Xuankai Chang, Yuya Fujita, Shinji Watanabe
In this paper, we propose a new approach to enrich the semantic representation of HuBERT.
no code implementations • 27 Sep 2023 • Brian Yan, Xuankai Chang, Antonios Anastasopoulos, Yuya Fujita, Shinji Watanabe
Recent works in end-to-end speech-to-text translation (ST) have proposed multi-tasking methods with soft parameter sharing which leverage machine translation (MT) data via secondary encoders that map text inputs to an eventual cross-modal representation.
no code implementations • 27 Sep 2023 • Xuankai Chang, Brian Yan, Kwanghee Choi, Jeeweon Jung, Yichen Lu, Soumi Maiti, Roshan Sharma, Jiatong Shi, Jinchuan Tian, Shinji Watanabe, Yuya Fujita, Takashi Maekaku, Pengcheng Guo, Yao-Fei Cheng, Pavel Denisov, Kohei Saijo, Hsiu-Hsuan Wang
Speech signals, typically sampled at rates in the tens of thousands per second, contain redundancies, evoking inefficiencies in sequence modeling.
no code implementations • 11 Nov 2022 • Motoi Omachi, Brian Yan, Siddharth Dalmia, Yuya Fujita, Shinji Watanabe
To solve this problem, we would like to simultaneously generate automatic speech recognition (ASR) and ST predictions such that each source language word is explicitly mapped to a target language word.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 1 Apr 2022 • Xuankai Chang, Takashi Maekaku, Yuya Fujita, Shinji Watanabe
This work presents our end-to-end (E2E) automatic speech recognition (ASR) model targetting at robust speech recognition, called Integraded speech Recognition with enhanced speech Input for Self-supervised learning representation (IRIS).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 7 Nov 2021 • Ryohei Fukuma, Takufumi Yanagisawa, Shinji Nishimoto, Hidenori Sugano, Kentaro Tamura, Shota Yamamoto, Yasushi Iimura, Yuya Fujita, Satoru Oshino, Naoki Tani, Naoko Koide-Majima, Yukiyasu Kamitani, Haruhiko Kishima
The successful control of the feedback images demonstrated that the semantic vector inferred from electrocorticograms became closer to the vector of the imagined category, even while watching images from different categories.
no code implementations • 11 Oct 2021 • Yosuke Higuchi, Nanxin Chen, Yuya Fujita, Hirofumi Inaguma, Tatsuya Komatsu, Jaesong Lee, Jumon Nozaki, Tianzi Wang, Shinji Watanabe
Non-autoregressive (NAR) models simultaneously generate multiple outputs in a sequence, which significantly reduces the inference speed at the cost of accuracy drop compared to autoregressive baselines.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • 20 Jul 2021 • Tianzi Wang, Yuya Fujita, Xuankai Chang, Shinji Watanabe
Non-autoregressive (NAR) modeling has gained more and more attention in speech processing.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • NAACL 2021 • Motoi Omachi, Yuya Fujita, Shinji Watanabe, Matthew Wiesner
We propose a Transformer-based sequence-to-sequence model for automatic speech recognition (ASR) capable of simultaneously transcribing and annotating audio with linguistic information such as phonemic transcripts or part-of-speech (POS) tags.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +5
no code implementations • 18 Dec 2020 • Yuya Fujita, Tianzi Wang, Shinji Watanabe, Motoi Omachi
We propose a system to concatenate audio segmentation and non-autoregressive ASR to realize high accuracy and low RTF ASR.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 27 May 2020 • Yuya Fujita, Shinji Watanabe, Motoi Omachi, Xuankai Chan
One NAT model, mask-predict, has been applied to ASR but the model needs some heuristics or additional component to estimate the length of the output token sequence.
Audio and Speech Processing Sound
no code implementations • 19 Apr 2019 • Aswin Shanmugam Subramanian, Xiaofei Wang, Shinji Watanabe, Toru Taniguchi, Dung Tran, Yuya Fujita
This report investigates the ability of E2E ASR from standard close-talk to far-field applications by encompassing entire multichannel speech enhancement and ASR components within the S2S model.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 25 Oct 2018 • Yusuke Kida, Dung Tran, Motoi Omachi, Toru Taniguchi, Yuya Fujita
The proposed method firstly utilizes a DNN-based mask estimator to separate the mixture signal into the keyword signal uttered by the target speaker and the remaining background speech.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1