no code implementations • 5 Jun 2024 • Zhong Meng, Zelin Wu, Rohit Prabhavalkar, Cal Peyser, Weiran Wang, Nanxin Chen, Tara N. Sainath, Bhuvana Ramabhadran
Neural contextual biasing effectively improves automatic speech recognition (ASR) for crucial phrases within a speaker's context, particularly those that are infrequent in the training data.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 15 Apr 2024 • Zelin Wu, Gan Song, Christopher Li, Pat Rondon, Zhong Meng, Xavier Velez, Weiran Wang, Diamantino Caseiro, Golan Pundak, Tsendsuren Munkhdalai, Angad Chandorkar, Rohit Prabhavalkar
Contextual biasing enables speech recognizers to transcribe important phrases in the speaker's context, such as contact names, even if they are rare in, or absent from, the training data.
no code implementations • 14 Apr 2024 • Dongseong Hwang, Weiran Wang, Zhuoyuan Huo, Khe Chai Sim, Pedro Moreno Mengibar
While Transformers have revolutionized deep learning, their quadratic attention complexity hinders their ability to process infinitely long inputs.
no code implementations • 27 Feb 2024 • Rohit Prabhavalkar, Zhong Meng, Weiran Wang, Adam Stooke, Xingyu Cai, Yanzhang He, Arun Narayanan, Dongseong Hwang, Tara N. Sainath, Pedro J. Moreno
In the present work, we study one such strategy: applying multiple frame reduction layers in the encoder to compress encoder outputs into a small number of output frames.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 13 Dec 2023 • Shaojin Ding, David Qiu, David Rim, Yanzhang He, Oleg Rybakov, Bo Li, Rohit Prabhavalkar, Weiran Wang, Tara N. Sainath, Zhonglin Han, Jian Li, Amir Yazdanbakhsh, Shivani Agrawal
We conducted extensive experiments with a 2-billion parameter USM on a large-scale voice search dataset to evaluate our proposed method.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 29 Sep 2023 • Weiran Wang, Zelin Wu, Diamantino Caseiro, Tsendsuren Munkhdalai, Khe Chai Sim, Pat Rondon, Golan Pundak, Gan Song, Rohit Prabhavalkar, Zhong Meng, Ding Zhao, Tara Sainath, Pedro Moreno Mengibar
Contextual biasing refers to the problem of biasing the automatic speech recognition (ASR) systems towards rare entities that are relevant to the specific user or application scenarios.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 22 Sep 2023 • Weiran Wang, Rohit Prabhavalkar, Dongseong Hwang, Qiujia Li, Khe Chai Sim, Bo Li, James Qin, Xingyu Cai, Adam Stooke, Zhong Meng, CJ Zheng, Yanzhang He, Tara Sainath, Pedro Moreno Mengibar
In this work, we investigate two popular end-to-end automatic speech recognition (ASR) models, namely Connectionist Temporal Classification (CTC) and RNN-Transducer (RNN-T), for offline recognition of voice search queries, with up to 2B model parameters.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 15 Sep 2023 • Yiling Huang, Weiran Wang, Guanlong Zhao, Hank Liao, Wei Xia, Quan Wang
Whether it is the conventional modularized approach or the more recent end-to-end neural diarization (EEND), an additional automatic speech recognition (ASR) model and an orchestration algorithm are required to associate the speaker labels with recognized words.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 15 Sep 2023 • Haozhe Shan, Albert Gu, Zhong Meng, Weiran Wang, Krzysztof Choromanski, Tara Sainath
Online speech recognition, where the model only accesses context to the left, is an important and challenging use case for ASR systems.
1 code implementation • 14 Sep 2023 • Zhiheng Xi, Wenxiang Chen, Xin Guo, wei he, Yiwen Ding, Boyang Hong, Ming Zhang, Junzhe Wang, Senjie Jin, Enyu Zhou, Rui Zheng, Xiaoran Fan, Xiao Wang, Limao Xiong, Yuhao Zhou, Weiran Wang, Changhao Jiang, Yicheng Zou, Xiangyang Liu, Zhangyue Yin, Shihan Dou, Rongxiang Weng, Wensen Cheng, Qi Zhang, Wenjuan Qin, Yongyan Zheng, Xipeng Qiu, Xuanjing Huang, Tao Gui
Many efforts have been made to develop intelligent agents, but they mainly focus on advancement in algorithms or training strategies to enhance specific capabilities or performance on particular tasks.
no code implementations • 14 Aug 2023 • Shaan Bijwadia, Shuo-Yiin Chang, Weiran Wang, Zhong Meng, Hao Zhang, Tara N. Sainath
Text injection for automatic speech recognition (ASR), wherein unpaired text-only data is used to supplement paired audio-text data, has shown promising improvements for word error rate.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 31 Mar 2023 • Rami Botros, Anmol Gulati, Tara N. Sainath, Krzysztof Choromanski, Ruoming Pang, Trevor Strohman, Weiran Wang, Jiahui Yu
Conformer models maintain a large number of internal states, the vast majority of which are associated with self-attention layers.
no code implementations • 16 Feb 2023 • Zhong Meng, Weiran Wang, Rohit Prabhavalkar, Tara N. Sainath, Tongzhou Chen, Ehsan Variani, Yu Zhang, Bo Li, Andrew Rosenberg, Bhuvana Ramabhadran
We propose JEIT, a joint end-to-end (E2E) model and internal language model (ILM) training method to inject large-scale unpaired text into ILM during E2E training which improves rare-word speech recognition.
no code implementations • 13 Oct 2022 • Tara N. Sainath, Rohit Prabhavalkar, Ankur Bapna, Yu Zhang, Zhouyuan Huo, Zhehuai Chen, Bo Li, Weiran Wang, Trevor Strohman
In addition, we explore JOIST using a streaming E2E model with an order of magnitude more data, which are also novelties compared to previous works.
no code implementations • 29 Jun 2022 • Ke Hu, Tara N. Sainath, Yanzhang He, Rohit Prabhavalkar, Trevor Strohman, Sepand Mavandadi, Weiran Wang
Text-only and semi-supervised training based on audio-only data has gained popularity recently due to the wide availability of unlabeled text and speech data.
2 code implementations • 11 May 2022 • Yawei Li, Kai Zhang, Radu Timofte, Luc van Gool, Fangyuan Kong, Mingxi Li, Songwei Liu, Zongcai Du, Ding Liu, Chenhui Zhou, Jingyi Chen, Qingrui Han, Zheyuan Li, Yingqi Liu, Xiangyu Chen, Haoming Cai, Yu Qiao, Chao Dong, Long Sun, Jinshan Pan, Yi Zhu, Zhikai Zong, Xiaoxiao Liu, Zheng Hui, Tao Yang, Peiran Ren, Xuansong Xie, Xian-Sheng Hua, Yanbo Wang, Xiaozhong Ji, Chuming Lin, Donghao Luo, Ying Tai, Chengjie Wang, Zhizhong Zhang, Yuan Xie, Shen Cheng, Ziwei Luo, Lei Yu, Zhihong Wen, Qi Wu1, Youwei Li, Haoqiang Fan, Jian Sun, Shuaicheng Liu, Yuanfei Huang, Meiguang Jin, Hua Huang, Jing Liu, Xinjian Zhang, Yan Wang, Lingshun Long, Gen Li, Yuanfan Zhang, Zuowei Cao, Lei Sun, Panaetov Alexander, Yucong Wang, Minjie Cai, Li Wang, Lu Tian, Zheyuan Wang, Hongbing Ma, Jie Liu, Chao Chen, Yidong Cai, Jie Tang, Gangshan Wu, Weiran Wang, Shirui Huang, Honglei Lu, Huan Liu, Keyan Wang, Jun Chen, Shi Chen, Yuchun Miao, Zimo Huang, Lefei Zhang, Mustafa Ayazoğlu, Wei Xiong, Chengyi Xiong, Fei Wang, Hao Li, Ruimian Wen, Zhijing Yang, Wenbin Zou, Weixin Zheng, Tian Ye, Yuncheng Zhang, Xiangzhen Kong, Aditya Arora, Syed Waqas Zamir, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Dandan Gaoand Dengwen Zhouand Qian Ning, Jingzhu Tang, Han Huang, YuFei Wang, Zhangheng Peng, Haobo Li, Wenxue Guan, Shenghua Gong, Xin Li, Jun Liu, Wanjun Wang, Dengwen Zhou, Kun Zeng, Hanjiang Lin, Xinyu Chen, Jinsheng Fang
The aim was to design a network for single image super-resolution that achieved improvement of efficiency measured according to several metrics including runtime, parameters, FLOPs, activations, and memory consumption while at least maintaining the PSNR of 29. 00dB on DIV2K validation set.
no code implementations • 15 Apr 2022 • Weiran Wang, Ke Hu, Tara N. Sainath
We propose a streaming non-autoregressive (non-AR) decoding algorithm to deliberate the hypothesis alignment of a streaming RNN-T model.
no code implementations • 15 Apr 2022 • Weiran Wang, Tongzhou Chen, Tara N. Sainath, Ehsan Variani, Rohit Prabhavalkar, Ronny Huang, Bhuvana Ramabhadran, Neeraj Gaur, Sepand Mavandadi, Cal Peyser, Trevor Strohman, Yanzhang He, David Rybach
Language models (LMs) significantly improve the recognition accuracy of end-to-end (E2E) models on words rarely seen during training, when used in either the shallow fusion or the rescoring setups.
no code implementations • 13 Apr 2022 • Shaojin Ding, Weiran Wang, Ding Zhao, Tara N. Sainath, Yanzhang He, Robert David, Rami Botros, Xin Wang, Rina Panigrahy, Qiao Liang, Dongseong Hwang, Ian McGraw, Rohit Prabhavalkar, Trevor Strohman
In this paper, we propose a dynamic cascaded encoder Automatic Speech Recognition (ASR) model, which unifies models for different deployment scenarios.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • NeurIPS 2021 • Junwen Bai, Weiran Wang, Carla Gomes
We propose a novel sequence representation learning method, named Contrastively Disentangled Sequential Variational Autoencoder (C-DSVAE), to extract and separate the static (time-invariant) and dynamic (time-variant) factors in the latent space.
1 code implementation • ICLR 2022 • Qi Lyu, Xiao Fu, Weiran Wang, Songtao Lu
Under this model, latent correlation maximization is shown to guarantee the extraction of the shared components across views (up to certain ambiguities).
2 code implementations • ICLR 2021 • Junwen Bai, Weiran Wang, Yingbo Zhou, Caiming Xiong
We propose Deep Autoencoding Predictive Components (DAPC) -- a self-supervised representation learning method for sequence data, based on the intuition that useful representations of sequence data should exhibit a simple structure in the latent space.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 8 Apr 2020 • Weiran Wang, Guangsen Wang, Aadyot Bhatnagar, Yingbo Zhou, Caiming Xiong, Richard Socher
For Switchboard, our phone-based BPE system achieves 6. 8\%/14. 4\% word error rate (WER) on the Switchboard/CallHome portion of the test set while joint decoding achieves 6. 3\%/13. 3\% WER.
no code implementations • 24 Jan 2020 • Yang Chen, Weiran Wang, I-Fan Chen, Chao Wang
Practitioners often need to build ASR systems for new use cases in a short amount of time, given limited in-domain data.
no code implementations • 24 Jan 2020 • Yang Chen, Weiran Wang, Chao Wang
While deep learning based end-to-end automatic speech recognition (ASR) systems have greatly simplified modeling pipelines, they suffer from the data sparsity issue.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • 16 Sep 2019 • Weimin Wang, Weiran Wang, Ming Sun, Chao Wang
Acoustic Scene Classification (ASC) is a challenging task, as a single scene may involve multiple events that contain complex sound patterns.
no code implementations • ACL 2019 • Gustavo Aguilar, Viktor Rozgić, Weiran Wang, Chao Wang
Studies on emotion recognition (ER) show that combining lexical and acoustic information results in more robust and accurate models.
no code implementations • ICLR 2019 • Qingming Tang, Mingda Chen, Weiran Wang, Karen Livescu
Existing variational recurrent models typically use stochastic recurrent connections to model the dependence among neighboring latent variables, while generation assumes independence of generated data per time step given the latent sequence.
no code implementations • 8 Mar 2019 • Weiran Wang
We adopt a multi-view approach for analyzing two knowledge transfer settings---learning using privileged information (LUPI) and distillation---in a common framework.
no code implementations • 19 Mar 2018 • Qingming Tang, Weiran Wang, Karen Livescu
Previous work has shown that it is possible to improve speech recognition by learning acoustic features from paired acoustic-articulatory data, for example by using canonical correlation analysis (CCA) or its deep extensions.
no code implementations • 11 Feb 2018 • Weiran Wang, Jialei Wang, Mladen Kolar, Nathan Srebro
We propose methods for distributed graph-based multi-task learning that are based on weighted averaging of messages from other machines.
no code implementations • 25 Sep 2017 • Weiran Wang, Nathan Srebro
We study stochastic optimization of nonconvex loss functions, which are typical objectives for training neural networks.
no code implementations • 11 Aug 2017 • Qingming Tang, Weiran Wang, Karen Livescu
We study the problem of acoustic feature learning in the setting where we have access to another (non-acoustic) modality for feature learning but not at test time.
no code implementations • 25 Feb 2017 • Jialei Wang, Weiran Wang, Dan Garber, Nathan Srebro
We develop and analyze efficient "coordinate-wise" methods for finding the leading eigenvector, where each step involves only a vector-vector product.
no code implementations • 21 Feb 2017 • Chao Gao, Dan Garber, Nathan Srebro, Jialei Wang, Weiran Wang
We study the sample complexity of canonical correlation analysis (CCA), \ie, the number of samples needed to estimate the population canonical correlation and directions up to arbitrarily small error.
no code implementations • 21 Feb 2017 • Jialei Wang, Weiran Wang, Nathan Srebro
We present and analyze an approach for distributed stochastic optimization which is statistically optimal and achieves near-linear speedups (up to logarithmic factors).
no code implementations • 14 Nov 2016 • Wanjia He, Weiran Wang, Karen Livescu
Recent work has begun exploring neural acoustic word embeddings---fixed-dimensional vector representations of arbitrary-length speech segments corresponding to words.
no code implementations • 21 Oct 2016 • Hao Tang, Weiran Wang, Kevin Gimpel, Karen Livescu
Similarly to hybrid HMM-neural network models, segmental models of this class can be trained in two stages (frame classifier training followed by linear segmental model weight training), end to end (joint training of both frame classifier and linear weights), or with end-to-end fine-tuning after two-stage training.
no code implementations • 11 Oct 2016 • Weiran Wang, Xinchen Yan, Honglak Lee, Karen Livescu
We present deep variational canonical correlation analysis (VCCA), a deep multi-view learning model that extends the latent variable model interpretation of linear CCA to nonlinear observation models parameterized by deep neural networks.
no code implementations • 26 Sep 2016 • Taehwan Kim, Jonathan Keane, Weiran Wang, Hao Tang, Jason Riggle, Gregory Shakhnarovich, Diane Brentari, Karen Livescu
Recognizing fingerspelling is challenging for a number of reasons: It involves quick, small motions that are often highly coarticulated; it exhibits significant variation between signers; and there has been a dearth of continuous fingerspelling data collected.
no code implementations • 2 Aug 2016 • Hao Tang, Weiran Wang, Kevin Gimpel, Karen Livescu
Discriminative segmental models offer a way to incorporate flexible feature functions into speech recognition.
no code implementations • NeurIPS 2016 • Weiran Wang, Jialei Wang, Dan Garber, Nathan Srebro
We study the stochastic optimization of canonical correlation analysis (CCA), whose objective is nonconvex and does not decouple over training samples.
no code implementations • 13 Feb 2016 • Taehwan Kim, Weiran Wang, Hao Tang, Karen Livescu
Previous work has shown that it is possible to achieve almost 90% accuracies on fingerspelling recognition in a signer-dependent setting.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 7 Feb 2016 • Qingming Tang, Lifu Tu, Weiran Wang, Jinbo Xu
We propose a novel method for network inference from partially observed edges using a node-specific degree prior.
no code implementations • 5 Feb 2016 • Weiran Wang
We study the problem of column selection in large-scale kernel canonical correlation analysis (KCCA) using the Nystr\"om approximation, where one approximates two positive semi-definite kernel matrices using "landmark" points from the training set.
1 code implementation • 2 Feb 2016 • Weiran Wang, Raman Arora, Karen Livescu, Jeff Bilmes
We consider learning representations (features) in the setting in which we have access to multiple unlabeled views of the data for learning while only one view is available for downstream tasks.
no code implementations • 16 Nov 2015 • Tomer Michaeli, Weiran Wang, Karen Livescu
Several nonlinear extensions of the original linear CCA have been proposed, including kernel and deep neural network methods.
no code implementations • 15 Nov 2015 • Weiran Wang, Karen Livescu
Kernel canonical correlation analysis (KCCA) is a nonlinear multi-view representation learning technique with broad applicability in statistics and machine learning.
no code implementations • 7 Oct 2015 • Weiran Wang, Raman Arora, Karen Livescu, Nathan Srebro
Deep CCA is a recently proposed deep neural network extension to the traditional canonical correlation analysis (CCA), and has been successful for multi-view representation learning in several domains.
1 code implementation • 5 Oct 2015 • Herman Kamper, Weiran Wang, Karen Livescu
Recent studies have been revisiting whole words as the basic modelling unit in speech recognition and query applications, instead of phonetic units.
no code implementations • 22 Jul 2015 • Hao Tang, Weiran Wang, Kevin Gimpel, Karen Livescu
A typical solution is to use approximate decoding, either by beam pruning in a single pass or by beam pruning to generate a lattice followed by a second pass.
no code implementations • 3 Mar 2015 • Weiran Wang, Canyi Lu
We provide a simple and efficient algorithm for computing the Euclidean projection of a point onto the capped simplex---a simplex with an additional uniform bound on each coordinate---together with an elementary proof.
no code implementations • 2 Mar 2015 • Weiran Wang
We provide a simple and efficient algorithm for the projection operator for weighted $\ell_1$-norm regularization subject to a sum constraint, together with an elementary proof.
no code implementations • 16 Jun 2014 • Weiran Wang, Miguel Á. Carreira-Perpiñán
In addition to finding meaningful clusters, centroid-based clustering algorithms such as K-means or mean-shift should ideally find centroids that are valid patterns in the input space, representative of data in their cluster.
no code implementations • 26 May 2014 • Weiran Wang, Miguel Á. Carreira-Perpiñán
Using the method of auxiliary coordinates, we give a simple, efficient algorithm to train a combination of nonlinear DR and a classifier, and apply it to a RBF mapping with a linear SVM.
no code implementations • 23 May 2014 • Miguel Á. Carreira-Perpiñán, Weiran Wang
We consider the problem of learning soft assignments of $N$ items to $K$ categories given two sources of information: an item-category similarity matrix, which encourages items to be assigned to categories they are similar to (and to not be assigned to categories they are dissimilar to), and an item-item similarity matrix, which encourages similar items to have similar assignments.
3 code implementations • 6 Sep 2013 • Weiran Wang, Miguel Á. Carreira-Perpiñán
We provide an elementary proof of a simple, efficient algorithm for computing the Euclidean projection of a point onto the probability simplex.
no code implementations • 24 Apr 2013 • Miguel Á. Carreira-Perpiñán, Weiran Wang
Many clustering algorithms exist that estimate a cluster centroid, such as K-means, K-medoids or mean-shift, but no algorithm seems to exist that clusters data by returning exactly K meaningful modes.
no code implementations • NeurIPS 2011 • Weiran Wang, Miguel Á. Carreira-Perpiñán, Zhengdong Lu
In matrix completion, we are given a matrix where the values of only some of the entries are present, and we want to reconstruct the missing ones.