Search Results for author: Cheng-I Jeff Lai

Found 9 papers, 3 papers with code

Instruction-Following Speech Recognition

no code implementations18 Sep 2023 Cheng-I Jeff Lai, Zhiyun Lu, Liangliang Cao, Ruoming Pang

Conventional end-to-end Automatic Speech Recognition (ASR) models primarily focus on exact transcription tasks, lacking flexibility for nuanced user interactions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

SSAST: Self-Supervised Audio Spectrogram Transformer

2 code implementations19 Oct 2021 Yuan Gong, Cheng-I Jeff Lai, Yu-An Chung, James Glass

However, pure Transformer models tend to require more training data compared to CNNs, and the success of the AST relies on supervised pretraining that requires a large amount of labeled data and a complex training pipeline, thus limiting the practical usage of AST.

Audio Classification Emotion Recognition +4

Cross-Modal Discrete Representation Learning

no code implementations ACL 2022 Alexander H. Liu, SouYoung Jin, Cheng-I Jeff Lai, Andrew Rouditchenko, Aude Oliva, James Glass

Recent advances in representation learning have demonstrated an ability to represent information from different modalities such as video, text, and audio in a single high-level embedding vector.

Cross-Modal Retrieval Quantization +4

Cannot find the paper you are looking for? You can Submit a new open access paper.