no code implementations • 10 Jun 2024 • Chung-Ming Chien, Andros Tjandra, Apoorv Vyas, Matt Le, Bowen Shi, Wei-Ning Hsu
As the scale of generative models continues to grow, efficient reuse and adaptation of pre-trained models have become crucial considerations.
no code implementations • 25 Dec 2023 • Apoorv Vyas, Bowen Shi, Matthew Le, Andros Tjandra, Yi-Chiao Wu, Baishan Guo, Jiemin Zhang, Xinyue Zhang, Robert Adkins, William Ngan, Jeff Wang, Ivan Cruz, Bapi Akula, Akinniyi Akinyemi, Brian Ellis, Rashel Moritz, Yael Yungster, Alice Rakotoarison, Liang Tan, Chris Summers, Carleigh Wood, Joshua Lane, Mary Williamson, Wei-Ning Hsu
Research communities have made great progress over the past year advancing the performance of large scale audio generative models for a single modality (speech, sound, or music) through adopting more powerful generative models and scaling data.
Ranked #1 on Audio Generation on AudioCaps
no code implementations • 25 Oct 2023 • Alexander H. Liu, Matt Le, Apoorv Vyas, Bowen Shi, Andros Tjandra, Wei-Ning Hsu
Generative models have gained more and more attention in recent years for their remarkable success in tasks that required estimating and sampling data distribution to generate high-fidelity synthetic data.
3 code implementations • arXiv 2023 • Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli
Expanding the language coverage of speech technology has the potential to improve access to information for many more people.
no code implementations • 25 Apr 2022 • Apoorv Vyas, Wei-Ning Hsu, Michael Auli, Alexei Baevski
Our results for models pre-trained on 960h Librispeech dataset and fine-tuned on 10h of transcribed data show that using the same stochastic model, we get a smooth trade-off between word error rate (WER) and inference time with only marginal WER degradation compared to the W2V2 and SEW models trained for a specific setting.
no code implementations • 6 Apr 2021 • Apoorv Vyas, Srikanth Madikeri, Hervé Bourlard
On Switchboard (300h) we obtain relative improvements of 33% and 35% respectively.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
2 code implementations • 28 Dec 2020 • Apoorv Vyas, Srikanth Madikeri, Hervé Bourlard
In this work, we propose lattice-free MMI (LFMMI) for supervised adaptation of self-supervised pretrained acoustic model.
1 code implementation • 7 Oct 2020 • Srikanth Madikeri, Sibo Tong, Juan Zuluaga-Gomez, Apoorv Vyas, Petr Motlicek, Hervé Bourlard
We present a simple wrapper that is useful to train acoustic models in PyTorch using Kaldi's LF-MMI training framework.
Audio and Speech Processing Sound
1 code implementation • NeurIPS 2020 • Apoorv Vyas, Angelos Katharopoulos, François Fleuret
This results in a model with linear complexity with respect to the sequence length for a fixed number of clusters.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
7 code implementations • ICML 2020 • Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, François Fleuret
Transformers achieve remarkable performance in several tasks but due to their quadratic complexity, with respect to the input's length, they are prohibitively slow for very long sequences.
Ranked #5 on Offline RL on D4RL
1 code implementation • ECCV 2018 • Apoorv Vyas, Nataraj Jammalamadaka, Xia Zhu, Dipankar Das, Bharat Kaul, Theodore L. Willke
In conjunction with the standard cross-entropy loss, we minimize the novel loss to train an ensemble of classifiers.