Search Results for author: Yuan Shangguan

Found 24 papers, 1 papers with code

AudioChatLlama: Towards General-Purpose Speech Abilities for LLMs

no code implementations12 Nov 2023 Yassir Fathullah, Chunyang Wu, Egor Lakomkin, Ke Li, Junteng Jia, Yuan Shangguan, Jay Mahadeokar, Ozlem Kalinli, Christian Fuegen, Mike Seltzer

In this work, we extend the instruction-tuned Llama-2 model with end-to-end general-purpose speech processing and reasoning abilities while maintaining the wide range of original LLM capabilities, without using any carefully curated paired data.

Question Answering

Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model

no code implementations22 Sep 2023 Jiamin Xie, Ke Li, Jinxi Guo, Andros Tjandra, Yuan Shangguan, Leda Sari, Chunyang Wu, Junteng Jia, Jay Mahadeokar, Ozlem Kalinli

In this work, we propose the use of an adaptive masking approach in two scenarios for pruning a multilingual ASR model efficiently, each resulting in sparse monolingual models or a sparse multilingual model (named as Dynamic ASR Pathways).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech Recognition

no code implementations14 Sep 2023 Yang Li, Liangzhen Lai, Yuan Shangguan, Forrest N. Iandola, Zhaoheng Ni, Ernie Chang, Yangyang Shi, Vikas Chandra

Instead, the bottleneck lies in the linear projection layers of multi-head attention and feedforward networks, constituting a substantial portion of the model size and contributing significantly to computation, memory, and power usage.

speech-recognition Speech Recognition

TODM: Train Once Deploy Many Efficient Supernet-Based RNN-T Compression For On-device ASR Models

no code implementations5 Sep 2023 Yuan Shangguan, Haichuan Yang, Danni Li, Chunyang Wu, Yassir Fathullah, Dilin Wang, Ayushi Dalmia, Raghuraman Krishnamoorthi, Ozlem Kalinli, Junteng Jia, Jay Mahadeokar, Xin Lei, Mike Seltzer, Vikas Chandra

Results demonstrate that our TODM Supernet either matches or surpasses the performance of manually tuned models by up to a relative of 3% better in word error rate (WER), while efficiently keeping the cost of training many models at a small constant.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Prompting Large Language Models with Speech Recognition Abilities

no code implementations21 Jul 2023 Yassir Fathullah, Chunyang Wu, Egor Lakomkin, Junteng Jia, Yuan Shangguan, Ke Li, Jinxi Guo, Wenhan Xiong, Jay Mahadeokar, Ozlem Kalinli, Christian Fuegen, Mike Seltzer

Furthermore, we perform ablation studies to investigate whether the LLM can be completely frozen during training to maintain its original capabilities, scaling up the audio encoder, and increasing the audio encoder striding to generate fewer embeddings.

Abstractive Text Summarization Automatic Speech Recognition +3

Towards Selection of Text-to-speech Data to Augment ASR Training

no code implementations30 May 2023 Shuo Liu, Leda Sari, Chunyang Wu, Gil Keren, Yuan Shangguan, Jay Mahadeokar, Ozlem Kalinli

This paper presents a method for selecting appropriate synthetic speech samples from a given large text-to-speech (TTS) dataset as supplementary training data for an automatic speech recognition (ASR) model.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Multi-Head State Space Model for Speech Recognition

no code implementations21 May 2023 Yassir Fathullah, Chunyang Wu, Yuan Shangguan, Junteng Jia, Wenhan Xiong, Jay Mahadeokar, Chunxi Liu, Yangyang Shi, Ozlem Kalinli, Mike Seltzer, Mark J. F. Gales

State space models (SSMs) have recently shown promising results on small-scale sequence and language modelling tasks, rivalling and outperforming many attention-based approaches.

Language Modelling speech-recognition +1

Handling the Alignment for Wake Word Detection: A Comparison Between Alignment-Based, Alignment-Free and Hybrid Approaches

no code implementations17 Feb 2023 Vinicius Ribeiro, Yiteng Huang, Yuan Shangguan, Zhaojun Yang, Li Wan, Ming Sun

The third, proposed by us, is a hybrid solution in which the model is trained with a small set of aligned data and then tuned with a sizeable unaligned dataset.

Learning a Dual-Mode Speech Recognition Model via Self-Pruning

no code implementations25 Jul 2022 Chunxi Liu, Yuan Shangguan, Haichuan Yang, Yangyang Shi, Raghuraman Krishnamoorthi, Ozlem Kalinli

There is growing interest in unifying the streaming and full-context automatic speech recognition (ASR) networks into a single end-to-end ASR model to simplify the model training and deployment for both use cases.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Federated Domain Adaptation for ASR with Full Self-Supervision

no code implementations30 Mar 2022 Junteng Jia, Jay Mahadeokar, Weiyi Zheng, Yuan Shangguan, Ozlem Kalinli, Frank Seide

Cross-device federated learning (FL) protects user privacy by collaboratively training a model on user devices, therefore eliminating the need for collecting, storing, and manually labeling user data.

Automatic Speech Recognition (ASR) Data Augmentation +2

Dissecting User-Perceived Latency of On-Device E2E Speech Recognition

no code implementations6 Apr 2021 Yuan Shangguan, Rohit Prabhavalkar, Hang Su, Jay Mahadeokar, Yangyang Shi, Jiatong Zhou, Chunyang Wu, Duc Le, Ozlem Kalinli, Christian Fuegen, Michael L. Seltzer

As speech-enabled devices such as smartphones and smart speakers become increasingly ubiquitous, there is growing interest in building automatic speech recognition (ASR) systems that can run directly on-device; end-to-end (E2E) speech recognition models such as recurrent neural network transducers and their variants have recently emerged as prime candidates for this task.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Memory-efficient Speech Recognition on Smart Devices

no code implementations23 Feb 2021 Ganesh Venkatesh, Alagappan Valliappan, Jay Mahadeokar, Yuan Shangguan, Christian Fuegen, Michael L. Seltzer, Vikas Chandra

Recurrent transducer models have emerged as a promising solution for speech recognition on the current and next generation smart devices.

speech-recognition Speech Recognition

Alignment Restricted Streaming Recurrent Neural Network Transducer

no code implementations5 Nov 2020 Jay Mahadeokar, Yuan Shangguan, Duc Le, Gil Keren, Hang Su, Thong Le, Ching-Feng Yeh, Christian Fuegen, Michael L. Seltzer

There is a growing interest in the speech community in developing Recurrent Neural Network Transducer (RNN-T) models for automatic speech recognition (ASR) applications.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Improved Neural Language Model Fusion for Streaming Recurrent Neural Network Transducer

no code implementations26 Oct 2020 Suyoun Kim, Yuan Shangguan, Jay Mahadeokar, Antoine Bruguier, Christian Fuegen, Michael L. Seltzer, Duc Le

Recurrent Neural Network Transducer (RNN-T), like most end-to-end speech recognition model architectures, has an implicit neural network language model (NNLM) and cannot easily leverage unpaired text data during training.

Language Modelling speech-recognition +1

A Streaming On-Device End-to-End Model Surpassing Server-Side Conventional Model Quality and Latency

no code implementations28 Mar 2020 Tara N. Sainath, Yanzhang He, Bo Li, Arun Narayanan, Ruoming Pang, Antoine Bruguier, Shuo-Yiin Chang, Wei Li, Raziel Alvarez, Zhifeng Chen, Chung-Cheng Chiu, David Garcia, Alex Gruenstein, Ke Hu, Minho Jin, Anjuli Kannan, Qiao Liang, Ian McGraw, Cal Peyser, Rohit Prabhavalkar, Golan Pundak, David Rybach, Yuan Shangguan, Yash Sheth, Trevor Strohman, Mirko Visontai, Yonghui Wu, Yu Zhang, Ding Zhao

Thus far, end-to-end (E2E) models have not been shown to outperform state-of-the-art conventional models with respect to both quality, i. e., word error rate (WER), and latency, i. e., the time the hypothesis is finalized after the user stops speaking.


Optimizing Speech Recognition For The Edge

no code implementations26 Sep 2019 Yuan Shangguan, Jian Li, Qiao Liang, Raziel Alvarez, Ian McGraw

While most deployed speech recognition systems today still run on servers, we are in the midst of a transition towards deployments on edge devices.

Efficient Neural Network Quantization +2

Cannot find the paper you are looking for? You can Submit a new open access paper.