1 code implementation • 8 Nov 2023 • Daniel Galvez, Tim Kaldewey
While Connectionist Temporal Classification (CTC) models deliver state-of-the-art accuracy in automated speech recognition (ASR) pipelines, their performance has been limited by CPU-based beam search decoding.
1 code implementation • 30 Aug 2023 • Rafael Mosquera Gómez, Julián Eusse, Juan Ciro, Daniel Galvez, Ryan Hileman, Kurt Bollacker, David Kanter
The Speech Wikimedia Dataset is a publicly available compilation of audio with transcriptions extracted from Wikimedia Commons.
no code implementations • 10 Dec 2021 • Juan Ciro, Daniel Galvez, Tim Schlippe, David Kanter
This paper illustrates locality sensitive hasing (LSH) models for the identification and removal of nearly redundant data in a text dataset.
no code implementations • 17 Nov 2021 • Daniel Galvez, Greg Diamos, Juan Ciro, Juan Felipe Cerón, Keith Achorn, Anjali Gopi, David Kanter, Maximilian Lam, Mark Mazumder, Vijay Janapa Reddi
The People's Speech is a free-to-download 30, 000-hour and growing supervised conversational English speech recognition dataset licensed for academic and commercial usage under CC-BY-SA (with a CC-BY subset).
no code implementations • 21 Nov 2017 • Ahmad AbdulKader, Kareem Nassar, Mohamed Mahmoud, Daniel Galvez, Chetan Patil
We propose using cascaded classifiers for a keyword spotting (KWS) task on narrow-band (NB), 8kHz audio acquired in non-IID environments --- a more challenging task than most state-of-the-art KWS systems face.
no code implementations • INTERSPEECH 2016 2016 • Daniel Povey, Vijayaditya Peddinti, Daniel Galvez, Pegah Ghahrmani, Vimal Manohar, Xingyu Na, Yiming Wang, Sanjeev Khudanpur
Models trained with LFMMI provide a relative word error rate reduction of ∼11. 5%, over those trained with cross-entropy objective function, and ∼8%, over those trained with cross-entropy and sMBR objective functions.
Ranked #4 on Speech Recognition on WSJ eval92