Search Results for author: Zhiyun Lu

Found 17 papers, 1 papers with code

Direct Large Language Model Alignment Through Self-Rewarding Contrastive Prompt Distillation

no code implementations • 19 Feb 2024 • Aiwei Liu, Haoping Bai, Zhiyun Lu, Xiang Kong, Simon Wang, Jiulong Shan, Meng Cao, Lijie Wen

In this paper, we propose a method to evaluate the response preference by using the output probabilities of response pairs under contrastive prompt pairs, which could achieve better performance on LLaMA2-7B and LLaMA2-13B compared to RLAIF.

Language Modelling Large Language Model

Paper
Add Code

Instruction-Following Speech Recognition

no code implementations • 18 Sep 2023 • Cheng-I Jeff Lai, Zhiyun Lu, Liangliang Cao, Ruoming Pang

Conventional end-to-end Automatic Speech Recognition (ASR) models primarily focus on exact transcription tasks, lacking flexibility for nuanced user interactions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Less is More: Removing Text-regions Improves CLIP Training Efficiency and Robustness

1 code implementation • 8 May 2023 • Liangliang Cao, BoWen Zhang, Chen Chen, Yinfei Yang, Xianzhi Du, Wencong Zhang, Zhiyun Lu, Yantao Zheng

In this paper, we discuss two effective approaches to improve the efficiency and robustness of CLIP training: (1) augmenting the training dataset while maintaining the same number of optimization steps, and (2) filtering out samples that contain text regions in the image.

Adversarial Text Retrieval

861

Paper
Code

E2E Segmenter: Joint Segmenting and Decoding for Long-Form ASR

no code implementations • 22 Apr 2022 • W. Ronny Huang, Shuo-Yiin Chang, David Rybach, Rohit Prabhavalkar, Tara N. Sainath, Cyril Allauzen, Cal Peyser, Zhiyun Lu

Improving the performance of end-to-end ASR models on long utterances ranging from minutes to hours in length is an ongoing challenge in speech recognition.

Sentence speech-recognition +1

Paper
Add Code

Unsupervised Data Selection via Discrete Speech Representation for ASR

no code implementations • 5 Apr 2022 • Zhiyun Lu, Yongqiang Wang, Yu Zhang, Wei Han, Zhehuai Chen, Parisa Haghani

Self-supervised learning of speech representations has achieved impressive results in improving automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Improving the fusion of acoustic and text representations in RNN-T

no code implementations • 25 Jan 2022 • Chao Zhang, Bo Li, Zhiyun Lu, Tara N. Sainath, Shuo-Yiin Chang

The recurrent neural network transducer (RNN-T) has recently become the mainstream end-to-end approach for streaming automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Input Length Matters: Improving RNN-T and MWER Training for Long-form Telephony Speech Recognition

no code implementations • 8 Oct 2021 • Zhiyun Lu, Yanwei Pan, Thibault Doutre, Parisa Haghani, Liangliang Cao, Rohit Prabhavalkar, Chao Zhang, Trevor Strohman

Our experiments show that for both losses, the WER on long-form speech reduces substantially as the training utterance length increases.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Exploring Targeted Universal Adversarial Perturbations to End-to-end ASR Models

no code implementations • 6 Apr 2021 • Zhiyun Lu, Wei Han, Yu Zhang, Liangliang Cao

To attack RNN-T, we find prepending perturbation is more effective than the additive perturbation, and can mislead the models to predict the same short target on utterances of arbitrary length.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Improving Streaming Automatic Speech Recognition With Non-Streaming Model Distillation On Unsupervised Data

no code implementations • 22 Oct 2020 • Thibault Doutre, Wei Han, Min Ma, Zhiyun Lu, Chung-Cheng Chiu, Ruoming Pang, Arun Narayanan, Ananya Misra, Yu Zhang, Liangliang Cao

We propose a novel and effective learning method by leveraging a non-streaming ASR model as a teacher to generate transcripts on an arbitrarily large data set, which is then used to distill knowledge into streaming ASR models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Mean-Field Approximation to Gaussian-Softmax Integral with Application to Uncertainty Estimation

no code implementations • 13 Jun 2020 • Zhiyun Lu, Eugene Ie, Fei Sha

Many methods have been proposed to quantify the predictive uncertainty associated with the outputs of deep neural networks.

Out-of-Distribution Detection

Paper
Add Code

A Large Scale Speech Sentiment Corpus

no code implementations • LREC 2020 • Eric Chen, Zhiyun Lu, Hao Xu, Liangliang Cao, Yu Zhang, James Fan

We present a multimodal corpus for sentiment analysis based on the existing Switchboard-1 Telephone Speech Corpus released by the Linguistic Data Consortium.

Sentiment Analysis

Paper
Add Code

Speech Sentiment Analysis via Pre-trained Features from End-to-end ASR Models

no code implementations • 21 Nov 2019 • Zhiyun Lu, Liangliang Cao, Yu Zhang, Chung-Cheng Chiu, James Fan

In this paper, we propose to use pre-trained features from end-to-end ASR models to solve speech sentiment analysis as a down-stream task.

Sentiment Analysis

Paper
Add Code

Hyper-parameter Tuning under a Budget Constraint

no code implementations • 1 Feb 2019 • Zhiyun Lu, Chao-Kai Chiang, Fei Sha

We study a budgeted hyper-parameter tuning problem, where we optimize the tuning result under a hard resource constraint.

Decision Making

Paper
Add Code

Kernel Approximation Methods for Speech Recognition

no code implementations • 13 Jan 2017 • Avner May, Alireza Bagheri Garakani, Zhiyun Lu, Dong Guo, Kuan Liu, Aurélien Bellet, Linxi Fan, Michael Collins, Daniel Hsu, Brian Kingsbury, Michael Picheny, Fei Sha

First, in order to reduce the number of random features required by kernel models, we propose a simple but effective method for feature selection.

feature selection speech-recognition +1

Paper
Add Code

Learning Compact Recurrent Neural Networks

no code implementations • 9 Apr 2016 • Zhiyun Lu, Vikas Sindhwani, Tara N. Sainath

Recurrent neural networks (RNNs), including long short-term memory (LSTM) RNNs, have produced state-of-the-art results on a variety of speech recognition tasks.

speech-recognition Speech Recognition

Paper
Add Code

A Comparison between Deep Neural Nets and Kernel Acoustic Models for Speech Recognition

no code implementations • 18 Mar 2016 • Zhiyun Lu, Dong Guo, Alireza Bagheri Garakani, Kuan Liu, Avner May, Aurelien Bellet, Linxi Fan, Michael Collins, Brian Kingsbury, Michael Picheny, Fei Sha

We study large-scale kernel methods for acoustic modeling and compare to DNNs on performance metrics related to both acoustic modeling and recognition.

General Classification Model Selection +2

Paper
Add Code

How to Scale Up Kernel Methods to Be As Good As Deep Neural Nets

no code implementations • 14 Nov 2014 • Zhiyun Lu, Avner May, Kuan Liu, Alireza Bagheri Garakani, Dong Guo, Aurélien Bellet, Linxi Fan, Michael Collins, Brian Kingsbury, Michael Picheny, Fei Sha

The computational complexity of kernel methods has often been a major barrier for applying them to large-scale learning problems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.