no code implementations • 26 Jan 2024 • Jinhan Wang, Long Chen, Aparna Khare, Anirudh Raju, Pranav Dheram, Di He, Minhua Wu, Andreas Stolcke, Venkatesh Ravichandran
We propose an approach for continuous prediction of turn-taking and backchanneling locations in spoken dialogue by fusing a neural acoustic model with a large language model (LLM).
1 code implementation • 2 Jun 2023 • Jinhan Wang, Vijay Ravi, Abeer Alwan
We find that a greater adversarial weight for the initial layers leads to performance improvement.
1 code implementation • 28 Apr 2023 • Ruchao Fan, Yunzheng Zhu, Jinhan Wang, Abeer Alwan
With the proposed methods (E-APC and DRAFT), the relative WER improvements are even larger (30% and 19% on the OGI and MyST data, respectively) when compared to the models without using pretraining methods.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 27 Jun 2022 • Jinhan Wang, Vijay Ravi, Jonathan Flint, Abeer Alwan
To learn instance-spread-out embeddings, we explore methods for sampling instances for a training batch (distinct speaker-based and random sampling).
no code implementations • 20 Jun 2022 • Vijay Ravi, Jinhan Wang, Jonathan Flint, Abeer Alwan
With adversarial training, depression classification improves for every feature when compared to the baseline.
no code implementations • 22 Feb 2022 • Jinhan Wang, Xiaosu Tong, Jinxi Guo, Di He, Roland Maas
Results show that the proposed method can achieve a 20% relative computation cost reduction on Librispeech and Microsoft Speech Language Translation long-form corpus while maintaining the WER performance when comparing to the best performing overlapping inference algorithm.
no code implementations • 11 Feb 2022 • Vijay Ravi, Jinhan Wang, Jonathan Flint, Abeer Alwan
The improvements for the CONVERGE (Mandarin) dataset when using the x-vector embeddings with CNN as the backend and MFCCs as input features were 9. 32% (validation) and 12. 99% (test).
no code implementations • 18 Jun 2021 • Jinhan Wang, Yunzheng Zhu, Ruchao Fan, Wei Chu, Abeer Alwan
~ 5 hours of transcribed data and ~ 60 hours of untranscribed data are provided to develop a German ASR system for children.