Search Results for author: Weiran Wang

Found 60 papers, 9 papers with code

Text Injection for Neural Contextual Biasing

no code implementations5 Jun 2024 Zhong Meng, Zelin Wu, Rohit Prabhavalkar, Cal Peyser, Weiran Wang, Nanxin Chen, Tara N. Sainath, Bhuvana Ramabhadran

Neural contextual biasing effectively improves automatic speech recognition (ASR) for crucial phrases within a speaker's context, particularly those that are infrequent in the training data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Deferred NAM: Low-latency Top-K Context Injection via Deferred Context Encoding for Non-Streaming ASR

no code implementations15 Apr 2024 Zelin Wu, Gan Song, Christopher Li, Pat Rondon, Zhong Meng, Xavier Velez, Weiran Wang, Diamantino Caseiro, Golan Pundak, Tsendsuren Munkhdalai, Angad Chandorkar, Rohit Prabhavalkar

Contextual biasing enables speech recognizers to transcribe important phrases in the speaker's context, such as contact names, even if they are rare in, or absent from, the training data.

TransformerFAM: Feedback attention is working memory

no code implementations14 Apr 2024 Dongseong Hwang, Weiran Wang, Zhuoyuan Huo, Khe Chai Sim, Pedro Moreno Mengibar

While Transformers have revolutionized deep learning, their quadratic attention complexity hinders their ability to process infinitely long inputs.

Contextual Biasing with the Knuth-Morris-Pratt Matching Algorithm

no code implementations29 Sep 2023 Weiran Wang, Zelin Wu, Diamantino Caseiro, Tsendsuren Munkhdalai, Khe Chai Sim, Pat Rondon, Golan Pundak, Gan Song, Rohit Prabhavalkar, Zhong Meng, Ding Zhao, Tara Sainath, Pedro Moreno Mengibar

Contextual biasing refers to the problem of biasing the automatic speech recognition (ASR) systems towards rare entities that are relevant to the specific user or application scenarios.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Massive End-to-end Models for Short Search Queries

no code implementations22 Sep 2023 Weiran Wang, Rohit Prabhavalkar, Dongseong Hwang, Qiujia Li, Khe Chai Sim, Bo Li, James Qin, Xingyu Cai, Adam Stooke, Zhong Meng, CJ Zheng, Yanzhang He, Tara Sainath, Pedro Moreno Mengibar

In this work, we investigate two popular end-to-end automatic speech recognition (ASR) models, namely Connectionist Temporal Classification (CTC) and RNN-Transducer (RNN-T), for offline recognition of voice search queries, with up to 2B model parameters.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Towards Word-Level End-to-End Neural Speaker Diarization with Auxiliary Network

no code implementations15 Sep 2023 Yiling Huang, Weiran Wang, Guanlong Zhao, Hank Liao, Wei Xia, Quan Wang

Whether it is the conventional modularized approach or the more recent end-to-end neural diarization (EEND), an additional automatic speech recognition (ASR) model and an orchestration algorithm are required to associate the speaker labels with recognized words.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Augmenting conformers with structured state-space sequence models for online speech recognition

no code implementations15 Sep 2023 Haozhe Shan, Albert Gu, Zhong Meng, Weiran Wang, Krzysztof Choromanski, Tara Sainath

Online speech recognition, where the model only accesses context to the left, is an important and challenging use case for ASR systems.

speech-recognition Speech Recognition

Text Injection for Capitalization and Turn-Taking Prediction in Speech Models

no code implementations14 Aug 2023 Shaan Bijwadia, Shuo-Yiin Chang, Weiran Wang, Zhong Meng, Hao Zhang, Tara N. Sainath

Text injection for automatic speech recognition (ASR), wherein unpaired text-only data is used to supplement paired audio-text data, has shown promising improvements for word error rate.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Practical Conformer: Optimizing size, speed and flops of Conformer for on-Device and cloud ASR

no code implementations31 Mar 2023 Rami Botros, Anmol Gulati, Tara N. Sainath, Krzysztof Choromanski, Ruoming Pang, Trevor Strohman, Weiran Wang, Jiahui Yu

Conformer models maintain a large number of internal states, the vast majority of which are associated with self-attention layers.

Decoder

JEIT: Joint End-to-End Model and Internal Language Model Training for Speech Recognition

no code implementations16 Feb 2023 Zhong Meng, Weiran Wang, Rohit Prabhavalkar, Tara N. Sainath, Tongzhou Chen, Ehsan Variani, Yu Zhang, Bo Li, Andrew Rosenberg, Bhuvana Ramabhadran

We propose JEIT, a joint end-to-end (E2E) model and internal language model (ILM) training method to inject large-scale unpaired text into ILM during E2E training which improves rare-word speech recognition.

Language Modelling speech-recognition +1

JOIST: A Joint Speech and Text Streaming Model For ASR

no code implementations13 Oct 2022 Tara N. Sainath, Rohit Prabhavalkar, Ankur Bapna, Yu Zhang, Zhouyuan Huo, Zhehuai Chen, Bo Li, Weiran Wang, Trevor Strohman

In addition, we explore JOIST using a streaming E2E model with an order of magnitude more data, which are also novelties compared to previous works.

Improving Deliberation by Text-Only and Semi-Supervised Training

no code implementations29 Jun 2022 Ke Hu, Tara N. Sainath, Yanzhang He, Rohit Prabhavalkar, Trevor Strohman, Sepand Mavandadi, Weiran Wang

Text-only and semi-supervised training based on audio-only data has gained popularity recently due to the wide availability of unlabeled text and speech data.

Decoder Language Modelling +1

NTIRE 2022 Challenge on Efficient Super-Resolution: Methods and Results

2 code implementations11 May 2022 Yawei Li, Kai Zhang, Radu Timofte, Luc van Gool, Fangyuan Kong, Mingxi Li, Songwei Liu, Zongcai Du, Ding Liu, Chenhui Zhou, Jingyi Chen, Qingrui Han, Zheyuan Li, Yingqi Liu, Xiangyu Chen, Haoming Cai, Yu Qiao, Chao Dong, Long Sun, Jinshan Pan, Yi Zhu, Zhikai Zong, Xiaoxiao Liu, Zheng Hui, Tao Yang, Peiran Ren, Xuansong Xie, Xian-Sheng Hua, Yanbo Wang, Xiaozhong Ji, Chuming Lin, Donghao Luo, Ying Tai, Chengjie Wang, Zhizhong Zhang, Yuan Xie, Shen Cheng, Ziwei Luo, Lei Yu, Zhihong Wen, Qi Wu1, Youwei Li, Haoqiang Fan, Jian Sun, Shuaicheng Liu, Yuanfei Huang, Meiguang Jin, Hua Huang, Jing Liu, Xinjian Zhang, Yan Wang, Lingshun Long, Gen Li, Yuanfan Zhang, Zuowei Cao, Lei Sun, Panaetov Alexander, Yucong Wang, Minjie Cai, Li Wang, Lu Tian, Zheyuan Wang, Hongbing Ma, Jie Liu, Chao Chen, Yidong Cai, Jie Tang, Gangshan Wu, Weiran Wang, Shirui Huang, Honglei Lu, Huan Liu, Keyan Wang, Jun Chen, Shi Chen, Yuchun Miao, Zimo Huang, Lefei Zhang, Mustafa Ayazoğlu, Wei Xiong, Chengyi Xiong, Fei Wang, Hao Li, Ruimian Wen, Zhijing Yang, Wenbin Zou, Weixin Zheng, Tian Ye, Yuncheng Zhang, Xiangzhen Kong, Aditya Arora, Syed Waqas Zamir, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Dandan Gaoand Dengwen Zhouand Qian Ning, Jingzhu Tang, Han Huang, YuFei Wang, Zhangheng Peng, Haobo Li, Wenxue Guan, Shenghua Gong, Xin Li, Jun Liu, Wanjun Wang, Dengwen Zhou, Kun Zeng, Hanjiang Lin, Xinyu Chen, Jinsheng Fang

The aim was to design a network for single image super-resolution that achieved improvement of efficiency measured according to several metrics including runtime, parameters, FLOPs, activations, and memory consumption while at least maintaining the PSNR of 29. 00dB on DIV2K validation set.

Image Super-Resolution

Streaming Align-Refine for Non-autoregressive Deliberation

no code implementations15 Apr 2022 Weiran Wang, Ke Hu, Tara N. Sainath

We propose a streaming non-autoregressive (non-AR) decoding algorithm to deliberate the hypothesis alignment of a streaming RNN-T model.

Decoder

Improving Rare Word Recognition with LM-aware MWER Training

no code implementations15 Apr 2022 Weiran Wang, Tongzhou Chen, Tara N. Sainath, Ehsan Variani, Rohit Prabhavalkar, Ronny Huang, Bhuvana Ramabhadran, Neeraj Gaur, Sepand Mavandadi, Cal Peyser, Trevor Strohman, Yanzhang He, David Rybach

Language models (LMs) significantly improve the recognition accuracy of end-to-end (E2E) models on words rarely seen during training, when used in either the shallow fusion or the rescoring setups.

Contrastively Disentangled Sequential Variational Autoencoder

1 code implementation NeurIPS 2021 Junwen Bai, Weiran Wang, Carla Gomes

We propose a novel sequence representation learning method, named Contrastively Disentangled Sequential Variational Autoencoder (C-DSVAE), to extract and separate the static (time-invariant) and dynamic (time-variant) factors in the latent space.

Representation Learning

Understanding Latent Correlation-Based Multiview Learning and Self-Supervision: An Identifiability Perspective

1 code implementation ICLR 2022 Qi Lyu, Xiao Fu, Weiran Wang, Songtao Lu

Under this model, latent correlation maximization is shown to guarantee the extraction of the shared components across views (up to certain ambiguities).

Clustering Disentanglement +2

Representation Learning for Sequence Data with Deep Autoencoding Predictive Components

2 code implementations ICLR 2021 Junwen Bai, Weiran Wang, Yingbo Zhou, Caiming Xiong

We propose Deep Autoencoding Predictive Components (DAPC) -- a self-supervised representation learning method for sequence data, based on the intuition that useful representations of sequence data should exhibit a simple structure in the latent space.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

An investigation of phone-based subword units for end-to-end speech recognition

no code implementations8 Apr 2020 Weiran Wang, Guangsen Wang, Aadyot Bhatnagar, Yingbo Zhou, Caiming Xiong, Richard Socher

For Switchboard, our phone-based BPE system achieves 6. 8\%/14. 4\% word error rate (WER) on the Switchboard/CallHome portion of the test set while joint decoding achieves 6. 3\%/13. 3\% WER.

Decoder Language Modelling +2

Data Techniques For Online End-to-end Speech Recognition

no code implementations24 Jan 2020 Yang Chen, Weiran Wang, I-Fan Chen, Chao Wang

Practitioners often need to build ASR systems for new use cases in a short amount of time, given limited in-domain data.

Data Augmentation Domain Adaptation +3

Semi-supervised ASR by End-to-end Self-training

no code implementations24 Jan 2020 Yang Chen, Weiran Wang, Chao Wang

While deep learning based end-to-end automatic speech recognition (ASR) systems have greatly simplified modeling pipelines, they suffer from the data sparsity issue.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Acoustic scene analysis with multi-head attention networks

1 code implementation16 Sep 2019 Weimin Wang, Weiran Wang, Ming Sun, Chao Wang

Acoustic Scene Classification (ASC) is a challenging task, as a single scene may involve multiple events that contain complex sound patterns.

Acoustic Scene Classification General Classification +1

Multimodal and Multi-view Models for Emotion Recognition

no code implementations ACL 2019 Gustavo Aguilar, Viktor Rozgić, Weiran Wang, Chao Wang

Studies on emotion recognition (ER) show that combining lexical and acoustic information results in more robust and accurate models.

Emotion Recognition MULTI-VIEW LEARNING

Variational recurrent models for representation learning

no code implementations ICLR 2019 Qingming Tang, Mingda Chen, Weiran Wang, Karen Livescu

Existing variational recurrent models typically use stochastic recurrent connections to model the dependence among neighboring latent variables, while generation assumes independence of generated data per time step given the latent sequence.

MULTI-VIEW LEARNING Representation Learning

Everything old is new again: A multi-view learning approach to learning using privileged information and distillation

no code implementations8 Mar 2019 Weiran Wang

We adopt a multi-view approach for analyzing two knowledge transfer settings---learning using privileged information (LUPI) and distillation---in a common framework.

MULTI-VIEW LEARNING Transfer Learning

Acoustic feature learning using cross-domain articulatory measurements

no code implementations19 Mar 2018 Qingming Tang, Weiran Wang, Karen Livescu

Previous work has shown that it is possible to improve speech recognition by learning acoustic features from paired acoustic-articulatory data, for example by using canonical correlation analysis (CCA) or its deep extensions.

speech-recognition Speech Recognition

Distributed Stochastic Multi-Task Learning with Graph Regularization

no code implementations11 Feb 2018 Weiran Wang, Jialei Wang, Mladen Kolar, Nathan Srebro

We propose methods for distributed graph-based multi-task learning that are based on weighted averaging of messages from other machines.

Multi-Task Learning

Stochastic Nonconvex Optimization with Large Minibatches

no code implementations25 Sep 2017 Weiran Wang, Nathan Srebro

We study stochastic optimization of nonconvex loss functions, which are typical objectives for training neural networks.

Stochastic Optimization

Acoustic Feature Learning via Deep Variational Canonical Correlation Analysis

no code implementations11 Aug 2017 Qingming Tang, Weiran Wang, Karen Livescu

We study the problem of acoustic feature learning in the setting where we have access to another (non-acoustic) modality for feature learning but not at test time.

Representation Learning

Efficient coordinate-wise leading eigenvector computation

no code implementations25 Feb 2017 Jialei Wang, Weiran Wang, Dan Garber, Nathan Srebro

We develop and analyze efficient "coordinate-wise" methods for finding the leading eigenvector, where each step involves only a vector-vector product.

regression

Stochastic Canonical Correlation Analysis

no code implementations21 Feb 2017 Chao Gao, Dan Garber, Nathan Srebro, Jialei Wang, Weiran Wang

We study the sample complexity of canonical correlation analysis (CCA), \ie, the number of samples needed to estimate the population canonical correlation and directions up to arbitrarily small error.

Stochastic Optimization

Memory and Communication Efficient Distributed Stochastic Optimization with Minibatch-Prox

no code implementations21 Feb 2017 Jialei Wang, Weiran Wang, Nathan Srebro

We present and analyze an approach for distributed stochastic optimization which is statistically optimal and achieves near-linear speedups (up to logarithmic factors).

Stochastic Optimization

Multi-view Recurrent Neural Acoustic Word Embeddings

no code implementations14 Nov 2016 Wanjia He, Weiran Wang, Karen Livescu

Recent work has begun exploring neural acoustic word embeddings---fixed-dimensional vector representations of arbitrary-length speech segments corresponding to words.

Retrieval Word Embeddings +1

End-to-End Training Approaches for Discriminative Segmental Models

no code implementations21 Oct 2016 Hao Tang, Weiran Wang, Kevin Gimpel, Karen Livescu

Similarly to hybrid HMM-neural network models, segmental models of this class can be trained in two stages (frame classifier training followed by linear segmental model weight training), end to end (joint training of both frame classifier and linear weights), or with end-to-end fine-tuning after two-stage training.

speech-recognition Speech Recognition

Deep Variational Canonical Correlation Analysis

no code implementations11 Oct 2016 Weiran Wang, Xinchen Yan, Honglak Lee, Karen Livescu

We present deep variational canonical correlation analysis (VCCA), a deep multi-view learning model that extends the latent variable model interpretation of linear CCA to nonlinear observation models parameterized by deep neural networks.

MULTI-VIEW LEARNING

Lexicon-Free Fingerspelling Recognition from Video: Data, Models, and Signer Adaptation

no code implementations26 Sep 2016 Taehwan Kim, Jonathan Keane, Weiran Wang, Hao Tang, Jason Riggle, Gregory Shakhnarovich, Diane Brentari, Karen Livescu

Recognizing fingerspelling is challenging for a number of reasons: It involves quick, small motions that are often highly coarticulated; it exhibits significant variation between signers; and there has been a dearth of continuous fingerspelling data collected.

Efficient Segmental Cascades for Speech Recognition

no code implementations2 Aug 2016 Hao Tang, Weiran Wang, Kevin Gimpel, Karen Livescu

Discriminative segmental models offer a way to incorporate flexible feature functions into speech recognition.

speech-recognition Speech Recognition

Efficient Globally Convergent Stochastic Optimization for Canonical Correlation Analysis

no code implementations NeurIPS 2016 Weiran Wang, Jialei Wang, Dan Garber, Nathan Srebro

We study the stochastic optimization of canonical correlation analysis (CCA), whose objective is nonconvex and does not decouple over training samples.

Stochastic Optimization

Signer-independent Fingerspelling Recognition with Deep Neural Network Adaptation

no code implementations13 Feb 2016 Taehwan Kim, Weiran Wang, Hao Tang, Karen Livescu

Previous work has shown that it is possible to achieve almost 90% accuracies on fingerspelling recognition in a signer-dependent setting.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Network Inference by Learned Node-Specific Degree Prior

no code implementations7 Feb 2016 Qingming Tang, Lifu Tu, Weiran Wang, Jinbo Xu

We propose a novel method for network inference from partially observed edges using a node-specific degree prior.

Matrix Completion

On Column Selection in Approximate Kernel Canonical Correlation Analysis

no code implementations5 Feb 2016 Weiran Wang

We study the problem of column selection in large-scale kernel canonical correlation analysis (KCCA) using the Nystr\"om approximation, where one approximates two positive semi-definite kernel matrices using "landmark" points from the training set.

Model Selection

On Deep Multi-View Representation Learning: Objectives and Optimization

1 code implementation2 Feb 2016 Weiran Wang, Raman Arora, Karen Livescu, Jeff Bilmes

We consider learning representations (features) in the setting in which we have access to multiple unlabeled views of the data for learning while only one view is available for downstream tasks.

Representation Learning Stochastic Optimization

Nonparametric Canonical Correlation Analysis

no code implementations16 Nov 2015 Tomer Michaeli, Weiran Wang, Karen Livescu

Several nonlinear extensions of the original linear CCA have been proposed, including kernel and deep neural network methods.

Representation Learning

Large-Scale Approximate Kernel Canonical Correlation Analysis

no code implementations15 Nov 2015 Weiran Wang, Karen Livescu

Kernel canonical correlation analysis (KCCA) is a nonlinear multi-view representation learning technique with broad applicability in statistics and machine learning.

Representation Learning Stochastic Optimization

Stochastic Optimization for Deep CCA via Nonlinear Orthogonal Iterations

no code implementations7 Oct 2015 Weiran Wang, Raman Arora, Karen Livescu, Nathan Srebro

Deep CCA is a recently proposed deep neural network extension to the traditional canonical correlation analysis (CCA), and has been successful for multi-view representation learning in several domains.

Representation Learning Stochastic Optimization

Deep convolutional acoustic word embeddings using word-pair side information

1 code implementation5 Oct 2015 Herman Kamper, Weiran Wang, Karen Livescu

Recent studies have been revisiting whole words as the basic modelling unit in speech recognition and query applications, instead of phonetic units.

speech-recognition Speech Recognition +1

Discriminative Segmental Cascades for Feature-Rich Phone Recognition

no code implementations22 Jul 2015 Hao Tang, Weiran Wang, Kevin Gimpel, Karen Livescu

A typical solution is to use approximate decoding, either by beam pruning in a single pass or by beam pruning to generate a lattice followed by a second pass.

Language Modelling speech-recognition +2

Projection onto the capped simplex

no code implementations3 Mar 2015 Weiran Wang, Canyi Lu

We provide a simple and efficient algorithm for computing the Euclidean projection of a point onto the capped simplex---a simplex with an additional uniform bound on each coordinate---together with an elementary proof.

An $\mathcal{O}(n\log n)$ projection operator for weighted $\ell_1$-norm regularization with sum constraint

no code implementations2 Mar 2015 Weiran Wang

We provide a simple and efficient algorithm for the projection operator for weighted $\ell_1$-norm regularization subject to a sum constraint, together with an elementary proof.

The Laplacian K-modes algorithm for clustering

no code implementations16 Jun 2014 Weiran Wang, Miguel Á. Carreira-Perpiñán

In addition to finding meaningful clusters, centroid-based clustering algorithms such as K-means or mean-shift should ideally find centroids that are valid patterns in the input space, representative of data in their cluster.

Clustering valid

The role of dimensionality reduction in linear classification

no code implementations26 May 2014 Weiran Wang, Miguel Á. Carreira-Perpiñán

Using the method of auxiliary coordinates, we give a simple, efficient algorithm to train a combination of nonlinear DR and a classifier, and apply it to a RBF mapping with a linear SVM.

Classification Dimensionality Reduction +1

LASS: a simple assignment model with Laplacian smoothing

no code implementations23 May 2014 Miguel Á. Carreira-Perpiñán, Weiran Wang

We consider the problem of learning soft assignments of $N$ items to $K$ categories given two sources of information: an item-category similarity matrix, which encourages items to be assigned to categories they are similar to (and to not be assigned to categories they are dissimilar to), and an item-item similarity matrix, which encourages similar items to have similar assignments.

Projection onto the probability simplex: An efficient algorithm with a simple proof, and an application

3 code implementations6 Sep 2013 Weiran Wang, Miguel Á. Carreira-Perpiñán

We provide an elementary proof of a simple, efficient algorithm for computing the Euclidean projection of a point onto the probability simplex.

Clustering

The K-modes algorithm for clustering

no code implementations24 Apr 2013 Miguel Á. Carreira-Perpiñán, Weiran Wang

Many clustering algorithms exist that estimate a cluster centroid, such as K-means, K-medoids or mean-shift, but no algorithm seems to exist that clusters data by returning exactly K meaningful modes.

Clustering valid

A Denoising View of Matrix Completion

no code implementations NeurIPS 2011 Weiran Wang, Miguel Á. Carreira-Perpiñán, Zhengdong Lu

In matrix completion, we are given a matrix where the values of only some of the entries are present, and we want to reconstruct the missing ones.

Denoising Matrix Completion

Cannot find the paper you are looking for? You can Submit a new open access paper.