no code implementations • 14 Jun 2024 • Roger Hsiao, Liuhui Deng, Erik McDermott, Ruchir Travadi, Xiaodan Zhuang
Byte-level representation is often used by large scale multilingual ASR systems when the character set of the supported languages is large.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 23 May 2023 • Jan Silovsky, Liuhui Deng, Arturo Argueta, Tresi Arvizo, Roger Hsiao, Sasha Kuznietsov, Yiu-Chang Lin, Xiaoqiang Xiao, Yuanyuan Zhang
Voice technology has become ubiquitous recently.
no code implementations • 29 Nov 2022 • Stefan Braun, Erik McDermott, Roger Hsiao
As a highlight, we manage to compute the transducer loss and gradients for a batch size of 1024, and audio length of 40 seconds, using only 6 GB of memory.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 2 Nov 2022 • Pawel Swietojanski, Stefan Braun, Dogan Can, Thiago Fraga da Silva, Arnab Ghoshal, Takaaki Hori, Roger Hsiao, Henry Mason, Erik McDermott, Honza Silovsky, Ruchir Travadi, Xiaodan Zhuang
This work studies the use of attention masking in transformer transducer based speech recognition for building a single configurable model for different deployment scenarios.
no code implementations • 21 Oct 2022 • Thien Nguyen, Nathalie Tran, Liuhui Deng, Thiago Fraga da Silva, Matthew Radzihovsky, Roger Hsiao, Henry Mason, Stefan Braun, Erik McDermott, Dogan Can, Pawel Swietojanski, Lyan Verwimp, Sibel Oyman, Tresi Arvizo, Honza Silovsky, Arnab Ghoshal, Mathieu Martel, Bharat Ram Ambati, Mohamed Ali
Code-switching describes the practice of using more than one language in the same sentence.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 1 May 2022 • Liuhui Deng, Roger Hsiao, Arnab Ghoshal
In this paper, we investigate how the output representation of an end-to-end neural network affects multilingual automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 12 Aug 2020 • Roger Hsiao, Dogan Can, Tim Ng, Ruchir Travadi, Arnab Ghoshal
The Listen, Attend and Spell (LAS) model and other attention-based automatic speech recognition (ASR) models have known limitations when operated in a fully online mode.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 29 Jan 2020 • Andrew Titus, Jan Silovsky, Nanxin Chen, Roger Hsiao, Mary Young, Arnab Ghoshal
Spoken language identification (LID) technologies have improved in recent years from discriminating largely distinct languages to discriminating highly similar languages or even dialects of the same language.
no code implementations • 14 Dec 2019 • Mingyu Yang, Roger Hsiao, Gordy Carichner, Katherine Ernst, Jaechan Lim, Delbert A. Green II, Inhee Lee, David Blaauw, Hun-Seok Kim
Details of Monarch butterfly migration from the U. S. to Mexico remain a mystery due to lack of a proper localization technology to accurately localize and track butterfly migration.