no code implementations • 24 Sep 2024 • Yosuke Kashiwagi, Hayato Futami, Emiru Tsunoo, Siddhant Arora, Shinji Watanabe
In many real-world scenarios, such as meetings, multiple speakers are present with an unknown number of participants, and their utterances often overlap.
no code implementations • 17 Sep 2024 • Yao-Fei Cheng, Hayato Futami, Yosuke Kashiwagi, Emiru Tsunoo, Wen Shen Teo, Siddhant Arora, Shinji Watanabe
However, expanding language pairs from an existing instruction-tuned ST system is costly due to the necessity of re-training on a combination of new and previous datasets.
no code implementations • 23 Jun 2024 • Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi, Siddhant Arora, Shinji Watanabe
In this work, we propose to use a decoder-only architecture for blockwise streaming ASR.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 18 Jun 2024 • Hayato Futami, Siddhant Arora, Yosuke Kashiwagi, Emiru Tsunoo, Shinji Watanabe
Recently, multi-task spoken language understanding (SLU) models have emerged, designed to address various speech processing tasks.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+7
no code implementations • 18 Jun 2024 • Yosuke Kashiwagi, Hayato Futami, Emiru Tsunoo, Siddhant Arora, Shinji Watanabe
End-to-end multilingual speech recognition models handle multiple languages through a single model, often incorporating language identification to automatically detect the language of incoming speech.
no code implementations • 15 Dec 2023 • Hayato Futami, Emiru Tsunoo, Yosuke Kashiwagi, Hiroaki Ogawa, Siddhant Arora, Shinji Watanabe
While the original TCPGen relies on grapheme-based encoding, we propose extending it with phoneme-aware encoding to better recognize words of unusual pronunciations.
no code implementations • 4 Oct 2023 • Siddhant Arora, Hayato Futami, Jee-weon Jung, Yifan Peng, Roshan Sharma, Yosuke Kashiwagi, Emiru Tsunoo, Karen Livescu, Shinji Watanabe
Recent studies leverage large language models with multi-tasking capabilities, using natural language prompts to guide the model's behavior and surpassing performance of task-specific models.
Ranked #1 on
Spoken Language Understanding
on Fluent Speech Commands
(using extra training data)
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 16 Sep 2023 • Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi, Siddhant Arora, Shinji Watanabe
Because the decoder architecture is the same as an autoregressive LM, it is simple to enhance the model by leveraging external text data with LM training.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • 24 Jul 2023 • Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi, Siddhant Arora, Shinji Watanabe
Although frame-based models, such as CTC and transducers, have an affinity for streaming automatic speech recognition, their decoding uses no future knowledge, which could lead to incorrect pruning.
no code implementations • 20 Jul 2023 • Siddhant Arora, Hayato Futami, Yosuke Kashiwagi, Emiru Tsunoo, Brian Yan, Shinji Watanabe
There has been an increased interest in the integration of pretrained speech recognition (ASR) and language models (LM) into the SLU framework.
no code implementations • 2 Jun 2023 • Yosuke Kashiwagi, Siddhant Arora, Hayato Futami, Jessica Huynh, Shih-Lun Wu, Yifan Peng, Brian Yan, Emiru Tsunoo, Shinji Watanabe
We reduce the model size by applying tensor decomposition to the Conformer and E-Branchformer architectures used in our E2E SLU models.
no code implementations • 2 May 2023 • Siddhant Arora, Hayato Futami, Shih-Lun Wu, Jessica Huynh, Yifan Peng, Yosuke Kashiwagi, Emiru Tsunoo, Brian Yan, Shinji Watanabe
Recently there have been efforts to introduce new benchmark tasks for spoken language understanding (SLU), like semantic parsing.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 2 May 2023 • Hayato Futami, Jessica Huynh, Siddhant Arora, Shih-Lun Wu, Yosuke Kashiwagi, Yifan Peng, Brian Yan, Emiru Tsunoo, Shinji Watanabe
In the track, we adopt a pipeline approach of ASR and NLU.
no code implementations • 1 May 2023 • Siddhant Arora, Hayato Futami, Emiru Tsunoo, Brian Yan, Shinji Watanabe
Most human interactions occur in the form of spoken conversations where the semantic meaning of a given utterance depends on the context.
1 code implementation • 16 Nov 2022 • Hayato Futami, Emiru Tsunoo, Kentaro Shibata, Yosuke Kashiwagi, Takao Okuda, Siddhant Arora, Shinji Watanabe
In this study, we propose Transformer-based encoder-decoder models that jointly solve speech recognition and disfluency detection, which work in a streaming manner.
1 code implementation • 8 Sep 2022 • Hayato Futami, Hirofumi Inaguma, Sei Ueno, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara
Connectionist temporal classification (CTC) -based models are attractive in automatic speech recognition (ASR) because of their non-autoregressive nature.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • 5 Sep 2022 • Hayato Futami, Hirofumi Inaguma, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara
In this study, we propose to distill the knowledge of BERT for CTC-based ASR, extending our previous study for attention-based ASR.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 5 Oct 2021 • Hayato Futami, Hirofumi Inaguma, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara
We propose an ASR rescoring method for directly detecting errors with ELECTRA, which is originally a pre-training method for NLP tasks.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
1 code implementation • 9 Aug 2020 • Hayato Futami, Hirofumi Inaguma, Sei Ueno, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara
Experimental evaluations show that our method significantly improves the ASR performance from the seq2seq baseline on the Corpus of Spontaneous Japanese (CSJ).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4