no code implementations • 7 Mar 2025 • Piotr Żelasko, Kunal Dhawan, Daniel Galvez, Krishna C. Puvvada, Ankita Pasad, Nithin Rao Koluguri, Ke Hu, Vitaly Lavrukhin, Jagadeesh Balam, Boris Ginsburg
However, the reported data and compute requirements for their training are prohibitive for many in the research community.
no code implementations • 20 Sep 2024 • Piotr Żelasko, Zhehuai Chen, Mengru Wang, Daniel Galvez, Oleksii Hrinchuk, Shuoyang Ding, Ke Hu, Jagadeesh Balam, Vitaly Lavrukhin, Boris Ginsburg
This work focuses on neural machine translation (NMT) and proposes a joint multimodal training regime of Speech-LLM to include automatic speech translation (AST).
1 code implementation • 10 Jun 2024 • Vladimir Bataev, Hainan Xu, Daniel Galvez, Vitaly Lavrukhin, Boris Ginsburg
This paper introduces a highly efficient greedy decoding algorithm for Transducer-based speech recognition models.
no code implementations • 6 Jun 2024 • Daniel Galvez, Vladimir Bataev, Hainan Xu, Tim Kaldewey
The vast majority of inference time for RNN Transducer (RNN-T) models today is spent on decoding.
1 code implementation • 8 Nov 2023 • Daniel Galvez, Tim Kaldewey
While Connectionist Temporal Classification (CTC) models deliver state-of-the-art accuracy in automated speech recognition (ASR) pipelines, their performance has been limited by CPU-based beam search decoding.
1 code implementation • 30 Aug 2023 • Rafael Mosquera Gómez, Julián Eusse, Juan Ciro, Daniel Galvez, Ryan Hileman, Kurt Bollacker, David Kanter
The Speech Wikimedia Dataset is a publicly available compilation of audio with transcriptions extracted from Wikimedia Commons.
no code implementations • 10 Dec 2021 • Juan Ciro, Daniel Galvez, Tim Schlippe, David Kanter
This paper illustrates locality sensitive hasing (LSH) models for the identification and removal of nearly redundant data in a text dataset.
no code implementations • 17 Nov 2021 • Daniel Galvez, Greg Diamos, Juan Ciro, Juan Felipe Cerón, Keith Achorn, Anjali Gopi, David Kanter, Maximilian Lam, Mark Mazumder, Vijay Janapa Reddi
The People's Speech is a free-to-download 30, 000-hour and growing supervised conversational English speech recognition dataset licensed for academic and commercial usage under CC-BY-SA (with a CC-BY subset).
no code implementations • 21 Nov 2017 • Ahmad AbdulKader, Kareem Nassar, Mohamed Mahmoud, Daniel Galvez, Chetan Patil
We propose using cascaded classifiers for a keyword spotting (KWS) task on narrow-band (NB), 8kHz audio acquired in non-IID environments --- a more challenging task than most state-of-the-art KWS systems face.
no code implementations • INTERSPEECH 2016 2016 • Daniel Povey, Vijayaditya Peddinti, Daniel Galvez, Pegah Ghahrmani, Vimal Manohar, Xingyu Na, Yiming Wang, Sanjeev Khudanpur
Models trained with LFMMI provide a relative word error rate reduction of ∼11. 5%, over those trained with cross-entropy objective function, and ∼8%, over those trained with cross-entropy and sMBR objective functions.
Ranked #5 on
Speech Recognition
on WSJ eval92