Search Results for author: Chunyang Wu

Found 21 papers, 1 papers with code

Effective internal language model training and fusion for factorized transducer model

no code implementations2 Apr 2024 Jinxi Guo, Niko Moritz, Yingyi Ma, Frank Seide, Chunyang Wu, Jay Mahadeokar, Ozlem Kalinli, Christian Fuegen, Mike Seltzer

However, even with the adoption of factorized transducer models, limited improvement has been observed compared to shallow fusion.

Language Modelling

AudioChatLlama: Towards General-Purpose Speech Abilities for LLMs

no code implementations12 Nov 2023 Yassir Fathullah, Chunyang Wu, Egor Lakomkin, Ke Li, Junteng Jia, Yuan Shangguan, Jay Mahadeokar, Ozlem Kalinli, Christian Fuegen, Mike Seltzer

In this work, we extend the instruction-tuned Llama-2 model with end-to-end general-purpose speech processing and reasoning abilities while maintaining the wide range of original LLM capabilities, without using any carefully curated paired data.

Question Answering

Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model

no code implementations22 Sep 2023 Jiamin Xie, Ke Li, Jinxi Guo, Andros Tjandra, Yuan Shangguan, Leda Sari, Chunyang Wu, Junteng Jia, Jay Mahadeokar, Ozlem Kalinli

In this work, we propose the use of an adaptive masking approach in two scenarios for pruning a multilingual ASR model efficiently, each resulting in sparse monolingual models or a sparse multilingual model (named as Dynamic ASR Pathways).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

End-to-End Speech Recognition Contextualization with Large Language Models

no code implementations19 Sep 2023 Egor Lakomkin, Chunyang Wu, Yassir Fathullah, Ozlem Kalinli, Michael L. Seltzer, Christian Fuegen

Overall, we demonstrate that by only adding a handful number of trainable parameters via adapters, we can unlock contextualized speech recognition capability for the pretrained LLM while keeping the same text-only input functionality.

Language Modelling speech-recognition +1

TODM: Train Once Deploy Many Efficient Supernet-Based RNN-T Compression For On-device ASR Models

no code implementations5 Sep 2023 Yuan Shangguan, Haichuan Yang, Danni Li, Chunyang Wu, Yassir Fathullah, Dilin Wang, Ayushi Dalmia, Raghuraman Krishnamoorthi, Ozlem Kalinli, Junteng Jia, Jay Mahadeokar, Xin Lei, Mike Seltzer, Vikas Chandra

Results demonstrate that our TODM Supernet either matches or surpasses the performance of manually tuned models by up to a relative of 3% better in word error rate (WER), while efficiently keeping the cost of training many models at a small constant.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Prompting Large Language Models with Speech Recognition Abilities

no code implementations21 Jul 2023 Yassir Fathullah, Chunyang Wu, Egor Lakomkin, Junteng Jia, Yuan Shangguan, Ke Li, Jinxi Guo, Wenhan Xiong, Jay Mahadeokar, Ozlem Kalinli, Christian Fuegen, Mike Seltzer

Furthermore, we perform ablation studies to investigate whether the LLM can be completely frozen during training to maintain its original capabilities, scaling up the audio encoder, and increasing the audio encoder striding to generate fewer embeddings.

Abstractive Text Summarization Automatic Speech Recognition +3

Towards Selection of Text-to-speech Data to Augment ASR Training

no code implementations30 May 2023 Shuo Liu, Leda Sari, Chunyang Wu, Gil Keren, Yuan Shangguan, Jay Mahadeokar, Ozlem Kalinli

This paper presents a method for selecting appropriate synthetic speech samples from a given large text-to-speech (TTS) dataset as supplementary training data for an automatic speech recognition (ASR) model.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Multi-Head State Space Model for Speech Recognition

no code implementations21 May 2023 Yassir Fathullah, Chunyang Wu, Yuan Shangguan, Junteng Jia, Wenhan Xiong, Jay Mahadeokar, Chunxi Liu, Yangyang Shi, Ozlem Kalinli, Mike Seltzer, Mark J. F. Gales

State space models (SSMs) have recently shown promising results on small-scale sequence and language modelling tasks, rivalling and outperforming many attention-based approaches.

Language Modelling speech-recognition +1

Anchored Speech Recognition with Neural Transducers

no code implementations20 Oct 2022 Desh Raj, Junteng Jia, Jay Mahadeokar, Chunyang Wu, Niko Moritz, Xiaohui Zhang, Ozlem Kalinli

In this paper, we investigate anchored speech recognition to make neural transducers robust to background speech.

speech-recognition Speech Recognition

Dissecting User-Perceived Latency of On-Device E2E Speech Recognition

no code implementations6 Apr 2021 Yuan Shangguan, Rohit Prabhavalkar, Hang Su, Jay Mahadeokar, Yangyang Shi, Jiatong Zhou, Chunyang Wu, Duc Le, Ozlem Kalinli, Christian Fuegen, Michael L. Seltzer

As speech-enabled devices such as smartphones and smart speakers become increasingly ubiquitous, there is growing interest in building automatic speech recognition (ASR) systems that can run directly on-device; end-to-end (E2E) speech recognition models such as recurrent neural network transducers and their variants have recently emerged as prime candidates for this task.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Dynamic Encoder Transducer: A Flexible Solution For Trading Off Accuracy For Latency

no code implementations5 Apr 2021 Yangyang Shi, Varun Nagaraja, Chunyang Wu, Jay Mahadeokar, Duc Le, Rohit Prabhavalkar, Alex Xiao, Ching-Feng Yeh, Julian Chan, Christian Fuegen, Ozlem Kalinli, Michael L. Seltzer

DET gets similar accuracy as a baseline model with better latency on a large in-house data set by assigning a lightweight encoder for the beginning part of one utterance and a full-size encoder for the rest.

speech-recognition Speech Recognition

Streaming Attention-Based Models with Augmented Memory for End-to-End Speech Recognition

no code implementations3 Nov 2020 Ching-Feng Yeh, Yongqiang Wang, Yangyang Shi, Chunyang Wu, Frank Zhang, Julian Chan, Michael L. Seltzer

Attention-based models have been gaining popularity recently for their strong performance demonstrated in fields such as machine translation and automatic speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Weak-Attention Suppression For Transformer Based Speech Recognition

no code implementations18 May 2020 Yangyang Shi, Yongqiang Wang, Chunyang Wu, Christian Fuegen, Frank Zhang, Duc Le, Ching-Feng Yeh, Michael L. Seltzer

Transformers, originally proposed for natural language processing (NLP) tasks, have recently achieved great success in automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

A machine learning method for the large-scale evaluation of urban visual environment

no code implementations11 Aug 2016 Lun Liu, Hui Wang, Chunyang Wu

Given the size of modern cities in the urbanising age, it is beyond the perceptual capacity of most people to develop a good knowledge about the beauty and ugliness of the city at every street corner.

BIG-bench Machine Learning Cultural Vocal Bursts Intensity Prediction

Cannot find the paper you are looking for? You can Submit a new open access paper.