Search Results for author: Chao-Han Huck Yang

Found 65 papers, 32 papers with code

Controllability, Multiplexing, and Transfer Learning in Networks using Evolutionary Learning

1 code implementation • 14 Nov 2018 • Rise Ooi, Chao-Han Huck Yang, Pin-Yu Chen, Vìctor Eguìluz, Narsis Kiani, Hector Zenil, David Gomez-Cabrero, Jesper Tegnèr

Next, (2) the learned networks are technically controllable as only a small number of driver nodes are required to move the system to a new state.

Transfer Learning

Paper
Code

When Causal Intervention Meets Adversarial Examples and Image Masking for Deep Neural Networks

1 code implementation • 9 Feb 2019 • Chao-Han Huck Yang, Yi-Chieh Liu, Pin-Yu Chen, Xiaoli Ma, Yi-Chang James Tsai

To study the intervention effects on pixel-level features for causal reasoning, we introduce pixel-wise masking and adversarial perturbation.

Causal Inference Visual Reasoning

Paper
Code

Synthesizing New Retinal Symptom Images by Multiple Generative Models

1 code implementation • 11 Feb 2019 • Yi-Chieh Liu, Hao-Hsiang Yang, Chao-Han Huck Yang, Jia-Hong Huang, Meng Tian, Hiromasa Morikawa, Yi-Chang James Tsai, Jesper Tegner

Age-Related Macular Degeneration (AMD) is an asymptomatic retinal disease which may result in loss of vision.

General Classification Image Generation

Paper
Code

Variational Quantum Circuits for Deep Reinforcement Learning

1 code implementation • 30 Jun 2019 • Samuel Yen-Chi Chen, Chao-Han Huck Yang, Jun Qi, Pin-Yu Chen, Xiaoli Ma, Hsi-Sheng Goan

To the best of our knowledge, this work is the first proof-of-principle demonstration of variational quantum circuits to approximate the deep $Q$-value function for decision-making and policy-selection reinforcement learning with experience replay and target network.

BIG-bench Machine Learning Decision Making +3

Paper
Code

Reinforcement Learning based Interconnection Routing for Adaptive Traffic Optimization

2 code implementations • 13 Aug 2019 • Sheng-Chun Kao, Chao-Han Huck Yang, Pin-Yu Chen, Xiaoli Ma, Tushar Krishna

In this work, we demonstrate the promise of applying reinforcement learning (RL) to optimize NoC runtime performance.

BIG-bench Machine Learning reinforcement-learning +1

Paper
Code

Interpretable Self-Attention Temporal Reasoning for Driving Behavior Understanding

no code implementations • 6 Nov 2019 • Yi-Chieh Liu, Yung-An Hsieh, Min-Hung Chen, Chao-Han Huck Yang, Jesper Tegner, Yi-Chang James Tsai

Performing driving behaviors based on causal reasoning is essential to ensure driving safety.

Paper
Add Code

Submodular Rank Aggregation on Score-based Permutations for Distributed Automatic Speech Recognition

1 code implementation • 27 Jan 2020 • Jun Qi, Chao-Han Huck Yang, Javier Tejedor

Distributed automatic speech recognition (ASR) requires to aggregate outputs of distributed deep neural network (DNN)-based models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Code

Tensor-to-Vector Regression for Multi-channel Speech Enhancement based on Tensor-Train Network

2 code implementations • 3 Feb 2020 • Jun Qi, Hu Hu, Yannan Wang, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee

Finally, in 8-channel conditions, a PESQ of 3. 12 is achieved using 20 million parameters for TTN, whereas a DNN with 68 million parameters can only attain a PESQ of 3. 06.

regression Speech Enhancement

Paper
Code

Evolving Neural Networks through a Reverse Encoding Tree

1 code implementation • 3 Feb 2020 • Haoling Zhang, Chao-Han Huck Yang, Hector Zenil, Narsis A. Kiani, Yue Shen, Jesper N. Tegner

Using RET, two types of approaches -- NEAT with Binary search encoding (Bi-NEAT) and NEAT with Golden-Section search encoding (GS-NEAT) -- have been designed to solve problems in benchmark continuous learning environments such as logic gates, Cartpole, and Lunar Lander, and tested against classical NEAT and FS-NEAT as baselines.

Paper
Code

Enhanced Adversarial Strategically-Timed Attacks against Deep Reinforcement Learning

no code implementations • 20 Feb 2020 • Chao-Han Huck Yang, Jun Qi, Pin-Yu Chen, Yi Ouyang, I-Te Danny Hung, Chin-Hui Lee, Xiaoli Ma

Recent deep neural networks based techniques, especially those equipped with the ability of self-adaptation in the system level such as deep reinforcement learning (DRL), are shown to possess many advantages of optimizing robot learning systems (e. g., autonomous navigation and continuous robot arm control.)

Autonomous Navigation reinforcement-learning +1

Paper
Add Code

Y-net: Multi-scale feature aggregation network with wavelet structure similarity loss function for single image dehazing

1 code implementation • 31 Mar 2020 • Hao-Hsiang Yang, Chao-Han Huck Yang, Yi-Chang James Tsai

Extensive experimental results demonstrate that the proposed Y-net with the W-SSIM loss function restores high-quality clear images and outperforms state-of-the-art algorithms.

Image Dehazing Single Image Dehazing +2

Paper
Code

Characterizing Speech Adversarial Examples Using Self-Attention U-Net Enhancement

no code implementations • 31 Mar 2020 • Chao-Han Huck Yang, Jun Qi, Pin-Yu Chen, Xiaoli Ma, Chin-Hui Lee

Recent studies have highlighted adversarial examples as ubiquitous threats to the deep neural network (DNN) based speech recognition systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Device-Robust Acoustic Scene Classification Based on Two-Stage Categorization and Data Augmentation

1 code implementation • 16 Jul 2020 • Hu Hu, Chao-Han Huck Yang, Xianjun Xia, Xue Bai, Xin Tang, Yajian Wang, Shutong Niu, Li Chai, Juanjuan Li, Hongning Zhu, Feng Bao, Yuanjun Zhao, Sabato Marco Siniscalchi, Yannan Wang, Jun Du, Chin-Hui Lee

On Task 1b development data set, we achieve an accuracy of 96. 7\% with a model size smaller than 500KB.

Acoustic Scene Classification Data Augmentation +3

Paper
Code

Wavelet Channel Attention Module with a Fusion Network for Single Image Deraining

no code implementations • 17 Jul 2020 • Hao-Hsiang Yang, Chao-Han Huck Yang, Yu-Chiang Frank Wang

Wavelet transform and the inverse wavelet transform are substituted for down-sampling and up-sampling so feature maps from the wavelet transform and convolutions contain different frequencies and scales.

Single Image Deraining

Paper
Add Code

Exploring Deep Hybrid Tensor-to-Vector Network Architectures for Regression Based Speech Enhancement

2 code implementations • 25 Jul 2020 • Jun Qi, Hu Hu, Yannan Wang, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee

Finally, our experiments of multi-channel speech enhancement on a simulated noisy WSJ0 corpus demonstrate that our proposed hybrid CNN-TT architecture achieves better results than both DNN and CNN models in terms of better-enhanced speech qualities and smaller parameter sizes.

regression Speech Enhancement

Paper
Code

Decentralizing Feature Extraction with Quantum Convolutional Neural Network for Automatic Speech Recognition

2 code implementations • 26 Oct 2020 • Chao-Han Huck Yang, Jun Qi, Samuel Yen-Chi Chen, Pin-Yu Chen, Sabato Marco Siniscalchi, Xiaoli Ma, Chin-Hui Lee

Testing on the Google Speech Commands Dataset, the proposed QCNN encoder attains a competitive accuracy of 95. 12% in a decentralized model, which is better than the previous architectures using centralized RNN models with convolutional features.

Ranked #1 on Keyword Spotting on Google Speech Commands (10-keyword Speech Commands dataset metric)

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Code

DeepOpht: Medical Report Generation for Retinal Images via Deep Models and Visual Explanation

1 code implementation • 1 Nov 2020 • Jia-Hong Huang, Chao-Han Huck Yang, Fangyu Liu, Meng Tian, Yi-Chieh Liu, Ting-Wei Wu, I-Hung Lin, Kang Wang, Hiromasa Morikawa, Hernghua Chang, Jesper Tegner, Marcel Worring

To train and validate the effectiveness of our DNN-based module, we propose a large-scale retinal disease image dataset.

Medical Report Generation

Paper
Code

A Two-Stage Approach to Device-Robust Acoustic Scene Classification

1 code implementation • 3 Nov 2020 • Hu Hu, Chao-Han Huck Yang, Xianjun Xia, Xue Bai, Xin Tang, Yajian Wang, Shutong Niu, Li Chai, Juanjuan Li, Hongning Zhu, Feng Bao, Yuanjun Zhao, Sabato Marco Siniscalchi, Yannan Wang, Jun Du, Chin-Hui Lee

To improve device robustness, a highly desirable key feature of a competitive data-driven acoustic scene classification (ASC) system, a novel two-stage system based on fully convolutional neural networks (CNNs) is proposed.

Ranked #1 on Acoustic Scene Classification on TAU Urban Acoustic Scenes 2019 (using extra training data)

Acoustic Scene Classification Classification +4

Paper
Code

Multi-task Language Modeling for Improving Speech Recognition of Rare Words

no code implementations • 23 Nov 2020 • Chao-Han Huck Yang, Linda Liu, Ankur Gandhe, Yile Gu, Anirudh Raju, Denis Filimonov, Ivan Bulyko

We show that our rescoring model trained with these additional tasks outperforms the baseline rescoring model, trained with only the language modeling task, by 1. 4% on a general test and by 2. 6% on a rare word test set in terms of word-error-rate relative (WERR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Training a Resilient Q-Network against Observational Interference

1 code implementation • 18 Feb 2021 • Chao-Han Huck Yang, I-Te Danny Hung, Yi Ouyang, Pin-Yu Chen

Deep reinforcement learning (DRL) has demonstrated impressive performance in various gaming simulators and real-world applications.

Causal Inference

Paper
Code

PATE-AAE: Incorporating Adversarial Autoencoder into Private Aggregation of Teacher Ensembles for Spoken Command Classification

no code implementations • 2 Apr 2021 • Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee

We propose using an adversarial autoencoder (AAE) to replace generative adversarial network (GAN) in the private aggregation of teacher ensembles (PATE), a solution for ensuring differential privacy in speech applications.

Ranked #3 on Keyword Spotting on Google Speech Commands (10-keyword Speech Commands dataset metric)

Generative Adversarial Network Keyword Spotting +1

Paper
Add Code

Longer Version for "Deep Context-Encoding Network for Retinal Image Captioning"

no code implementations • 30 May 2021 • Jia-Hong Huang, Ting-Wei Wu, Chao-Han Huck Yang, Marcel Worring

Automatically generating medical reports for retinal images is one of the promising ways to help ophthalmologists reduce their workload and improve work efficiency.

Avg Image Captioning +1

Paper
Add Code

Voice2Series: Reprogramming Acoustic Models for Time Series Classification

3 code implementations • 17 Jun 2021 • Chao-Han Huck Yang, Yun-Yun Tsai, Pin-Yu Chen

Learning to classify time series with limited data is a practical yet challenging problem.

Ranked #1 on ECG Classification on UCR Time Series Classification Archive

Classification Data Augmentation +5

Paper
Code

A Lottery Ticket Hypothesis Framework for Low-Complexity Device-Robust Neural Acoustic Scene Classification

no code implementations • 3 Jul 2021 • Hao Yen, Chao-Han Huck Yang, Hu Hu, Sabato Marco Siniscalchi, Qing Wang, Yuyang Wang, Xianjun Xia, Yuanjun Zhao, Yuzhong Wu, Yannan Wang, Jun Du, Chin-Hui Lee

We propose a novel neural model compression strategy combining data augmentation, knowledge transfer, pruning, and quantization for device-robust acoustic scene classification (ASC).

Acoustic Scene Classification Data Augmentation +5

Paper
Add Code

QTN-VQC: An End-to-End Learning framework for Quantum Neural Networks

no code implementations • 6 Oct 2021 • Jun Qi, Chao-Han Huck Yang, Pin-Yu Chen

The advent of noisy intermediate-scale quantum (NISQ) computers raises a crucial challenge to design quantum neural networks for fully quantum learning tasks.

Paper
Add Code

Neural Model Reprogramming with Similarity Based Mapping for Low-Resource Spoken Command Recognition

1 code implementation • 8 Oct 2021 • Hao Yen, Pin-Jui Ku, Chao-Han Huck Yang, Hu Hu, Sabato Marco Siniscalchi, Pin-Yu Chen, Yu Tsao

In this study, we propose a novel adversarial reprogramming (AR) approach for low-resource spoken command recognition (SCR), and build an AR-SCR system.

Spoken Command Recognition Transfer Learning

Paper
Code

A Variational Bayesian Approach to Learning Latent Variables for Acoustic Knowledge Transfer

1 code implementation • 16 Oct 2021 • Hu Hu, Sabato Marco Siniscalchi, Chao-Han Huck Yang, Chin-Hui Lee

We propose a variational Bayesian (VB) approach to learning distributions of latent variables in deep neural network (DNN) models for cross-domain knowledge transfer, to address acoustic mismatches between training and testing conditions.

Acoustic Scene Classification Scene Classification +1

Paper
Code

Pessimistic Model Selection for Offline Deep Reinforcement Learning

no code implementations • 29 Nov 2021 • Chao-Han Huck Yang, Zhengling Qi, Yifan Cui, Pin-Yu Chen

Deep Reinforcement Learning (DRL) has demonstrated great potentials in solving sequential decision making problems in many applications.

Decision Making Model Selection +2

Paper
Add Code

When BERT Meets Quantum Temporal Convolution Learning for Text Classification in Heterogeneous Computing

no code implementations • 17 Feb 2022 • Chao-Han Huck Yang, Jun Qi, Samuel Yen-Chi Chen, Yu Tsao, Pin-Yu Chen

Our experiments on intent classification show that our proposed BERT-QTC model attains competitive experimental results in the Snips and ATIS spoken language datasets.

intent-classification Intent Classification +4

Paper
Add Code

A Study of Designing Compact Audio-Visual Wake Word Spotting System Based on Iterative Fine-Tuning in Neural Network Pruning

no code implementations • 17 Feb 2022 • Hengshun Zhou, Jun Du, Chao-Han Huck Yang, Shifu Xiong, Chin-Hui Lee

Audio-only-based wake word spotting (WWS) is challenging under noisy conditions due to environmental interference in signal transmission.

Network Pruning

Paper
Add Code

Mitigating Closed-model Adversarial Examples with Bayesian Neural Modeling for Enhanced End-to-End Speech Recognition

no code implementations • 17 Feb 2022 • Chao-Han Huck Yang, Zeeshan Ahmed, Yile Gu, Joseph Szurley, Roger Ren, Linda Liu, Andreas Stolcke, Ivan Bulyko

In this work, we aim to enhance the system robustness of end-to-end automatic speech recognition (ASR) against adversarially-noisy speech examples.

Adversarial Robustness Automatic Speech Recognition +3

Paper
Add Code

A study on joint modeling and data augmentation of multi-modalities for audio-visual scene classification

no code implementations • 7 Mar 2022 • Qing Wang, Jun Du, Siyuan Zheng, Yunqing Li, Yajian Wang, Yuzhong Wu, Hu Hu, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Yannan Wang, Chin-Hui Lee

In this paper, we propose two techniques, namely joint modeling and data augmentation, to improve system performances for audio-visual scene classification (AVSC).

Data Augmentation Scene Classification

Paper
Add Code

Exploiting Low-Rank Tensor-Train Deep Neural Networks Based on Riemannian Gradient Descent With Illustrations of Speech Processing

1 code implementation • 11 Mar 2022 • Jun Qi, Chao-Han Huck Yang, Pin-Yu Chen, Javier Tejedor

This work focuses on designing low complexity hybrid tensor networks by considering trade-offs between the model complexity and practical performance.

Speech Enhancement Spoken Command Recognition +1

Paper
Code

Treatment Learning Causal Transformer for Noisy Image Classification

no code implementations • 29 Mar 2022 • Chao-Han Huck Yang, I-Te Danny Hung, Yi-Chieh Liu, Pin-Yu Chen

In this work, we incorporate this binary information of "existence of noise" as treatment into image classification tasks to improve prediction accuracy by jointly estimating their treatment effects.

Benchmarking Classification +3

Paper
Add Code

Theoretical Error Performance Analysis for Variational Quantum Circuit Based Functional Regression

1 code implementation • 8 Jun 2022 • Jun Qi, Chao-Han Huck Yang, Pin-Yu Chen, Min-Hsiu Hsieh

In this work, we first put forth an end-to-end quantum neural network, TTN-VQC, which consists of a quantum tensor network based on a tensor-train network (TTN) for dimensionality reduction and a VQC for functional regression.

Dimensionality Reduction regression

Paper
Code

An Experimental Study on Private Aggregation of Teacher Ensemble Learning for End-to-End Speech Recognition

no code implementations • 11 Oct 2022 • Chao-Han Huck Yang, I-Fan Chen, Andreas Stolcke, Sabato Marco Siniscalchi, Chin-Hui Lee

We evaluate three end-to-end deep models, including LAS, hybrid CTC/attention, and RNN transducer, on the open-source LibriSpeech and TIMIT corpora.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

An Ensemble Teacher-Student Learning Approach with Poisson Sub-sampling to Differential Privacy Preserving Speech Recognition

no code implementations • 12 Oct 2022 • Chao-Han Huck Yang, Jun Qi, Sabato Marco Siniscalchi, Chin-Hui Lee

We propose an ensemble learning framework with Poisson sub-sampling to effectively train a collection of teacher models to issue some differential privacy (DP) guarantee for training data.

Ensemble Learning Privacy Preserving +3

Paper
Add Code

A Quantum Kernel Learning Approach to Acoustic Modeling for Spoken Command Recognition

no code implementations • 2 Nov 2022 • Chao-Han Huck Yang, Bo Li, Yu Zhang, Nanxin Chen, Tara N. Sainath, Sabato Marco Siniscalchi, Chin-Hui Lee

We propose a quantum kernel learning (QKL) framework to address the inherent data sparsity issues often encountered in training large-scare acoustic models in low-resource scenarios.

Spoken Command Recognition

Paper
Add Code

Certified Robustness of Quantum Classifiers against Adversarial Examples through Quantum Noise

no code implementations • 2 Nov 2022 • Jhih-Cing Huang, Yu-Lin Tsai, Chao-Han Huck Yang, Cheng-Fang Su, Chia-Mu Yu, Pin-Yu Chen, Sy-Yen Kuo

Recently, quantum classifiers have been found to be vulnerable to adversarial attacks, in which quantum classifiers are deceived by imperceptible noises, leading to misclassification.

Paper
Add Code

Low-Resource Music Genre Classification with Cross-Modal Neural Model Reprogramming

1 code implementation • 2 Nov 2022 • Yun-Ning Hung, Chao-Han Huck Yang, Pin-Yu Chen, Alexander Lerch

In this work, we introduce a novel method for leveraging pre-trained models for low-resource (music) classification based on the concept of Neural Model Reprogramming (NMR).

Classification Genre classification +3

Paper
Code

Inference and Denoise: Causal Inference-based Neural Speech Enhancement

1 code implementation • 2 Nov 2022 • Tsun-An Hsieh, Chao-Han Huck Yang, Pin-Yu Chen, Sabato Marco Siniscalchi, Yu Tsao

This study addresses the speech enhancement (SE) task within the causal inference paradigm by modeling the noise presence as an intervention.

Causal Inference Speech Enhancement

Paper
Code

From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition

no code implementations • 19 Jan 2023 • Chao-Han Huck Yang, Bo Li, Yu Zhang, Nanxin Chen, Rohit Prabhavalkar, Tara N. Sainath, Trevor Strohman

In this work, we propose a new parameter-efficient learning framework based on neural model reprogramming for cross-lingual speech recognition, which can \textbf{re-purpose} well-trained English automatic speech recognition (ASR) models to recognize the other languages.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Causalainer: Causal Explainer for Automatic Video Summarization

no code implementations • 30 Apr 2023 • Jia-Hong Huang, Chao-Han Huck Yang, Pin-Yu Chen, Min-Hung Chen, Marcel Worring

In this work, a Causal Explainer, dubbed Causalainer, is proposed to address this issue.

Video Summarization

Paper
Add Code

Pre-training Tensor-Train Networks Facilitates Machine Learning with Variational Quantum Circuits

no code implementations • 18 May 2023 • Jun Qi, Chao-Han Huck Yang, Pin-Yu Chen, Min-Hsiu Hsieh

Variational quantum circuit (VQC) is a promising approach for implementing quantum neural networks on noisy intermediate-scale quantum (NISQ) devices.

Paper
Add Code

Parameter-Efficient Learning for Text-to-Speech Accent Adaptation

1 code implementation • 18 May 2023 • Li-Jen Yang, Chao-Han Huck Yang, Jen-Tzung Chien

This paper presents a parameter-efficient learning (PEL) to develop a low-resource accent adaptation for text-to-speech (TTS).

Paper
Code

A Parameter-Efficient Learning Approach to Arabic Dialect Identification with Pre-Trained General-Purpose Speech Model

1 code implementation • 18 May 2023 • Srijith Radhakrishnan, Chao-Han Huck Yang, Sumeer Ahmad Khan, Narsis A. Kiani, David Gomez-Cabrero, Jesper N. Tegner

In this work, we explore Parameter-Efficient-Learning (PEL) techniques to repurpose a General-Purpose-Speech (GSM) model for Arabic dialect identification (ADI).

Dialect Identification

Paper
Code

Differentially Private Adapters for Parameter Efficient Acoustic Modeling

1 code implementation • 19 May 2023 • Chun-Wei Ho, Chao-Han Huck Yang, Sabato Marco Siniscalchi

Evaluated on the open-access Multilingual Spoken Words (MLSW) dataset, our solution reduces the number of trainable parameters by 97. 5% using the RAs with only a 4% performance drop with respect to fine-tuning the cross-lingual speech classifier while preserving DP guarantees.

Paper
Code

A Neural State-Space Model Approach to Efficient Speech Separation

1 code implementation • 26 May 2023 • Chen Chen, Chao-Han Huck Yang, Kai Li, Yuchen Hu, Pin-Jui Ku, Eng Siong Chng

In this work, we introduce S4M, a new efficient speech separation framework based on neural state-space models (SSM).

Representation Learning Speech Separation

Paper
Code

A Multi-dimensional Deep Structured State Space Approach to Speech Enhancement Using Small-footprint Models

1 code implementation • 1 Jun 2023 • Pin-Jui Ku, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee

We propose a multi-dimensional structured state space (S4) approach to speech enhancement.

Data Augmentation Speech Enhancement

Paper
Code

How to Estimate Model Transferability of Pre-Trained Speech Models?

1 code implementation • 1 Jun 2023 • Zih-Ching Chen, Chao-Han Huck Yang, Bo Li, Yu Zhang, Nanxin Chen, Shuo-Yiin Chang, Rohit Prabhavalkar, Hung-Yi Lee, Tara N. Sainath

In this work, we introduce a "score-based assessment" framework for estimating the transferability of pre-trained speech models (PSMs) for fine-tuning target tasks.

Paper
Code

Causal Video Summarizer for Video Exploration

no code implementations • 4 Jul 2023 • Jia-Hong Huang, Chao-Han Huck Yang, Pin-Yu Chen, Andrew Brown, Marcel Worring

Multi-modal video summarization has a video input and a text-based query input.

Video Summarization

Paper
Add Code

Can Whisper perform speech-based in-context learning?

no code implementations • 13 Sep 2023 • Siyin Wang, Chao-Han Huck Yang, Ji Wu, Chao Zhang

Language-level adaptation experiments using Chinese dialects showed that when applying SICL to isolated word ASR, consistent and considerable relative WER reductions can be achieved using Whisper models of any size on two dialects, which is on average 32. 3%.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Low-rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition

no code implementations • 26 Sep 2023 • Yu Yu, Chao-Han Huck Yang, Jari Kolehmainen, Prashanth G. Shivakumar, Yile Gu, Sungho Ryu, Roger Ren, Qi Luo, Aditya Gourav, I-Fan Chen, Yi-Chieh Liu, Tuan Dinh, Ankur Gandhe, Denis Filimonov, Shalini Ghosh, Andreas Stolcke, Ariya Rastow, Ivan Bulyko

We propose a neural language modeling system based on low-rank adaptation (LoRA) for speech recognition output rescoring.

Language Modelling Large Language Model +2

Paper
Add Code

Generative Speech Recognition Error Correction with Large Language Models and Task-Activating Prompting

no code implementations • 27 Sep 2023 • Chao-Han Huck Yang, Yile Gu, Yi-Chieh Liu, Shalini Ghosh, Ivan Bulyko, Andreas Stolcke

We explore the ability of large language models (LLMs) to act as speech recognition post-processors that perform rescoring and error correction.

Ranked #3 on Speech Recognition on WSJ eval92 (using extra training data)

In-Context Learning speech-recognition +1

Paper
Add Code

HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models

1 code implementation • NeurIPS 2023 • Chen Chen, Yuchen Hu, Chao-Han Huck Yang, Sabato Macro Siniscalchi, Pin-Yu Chen, Eng Siong Chng

We make our results publicly accessible for reproducible pipelines with released pre-trained models, thus providing a new evaluation paradigm for ASR error correction with LLMs.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Code

Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition

1 code implementation • 10 Oct 2023 • Srijith Radhakrishnan, Chao-Han Huck Yang, Sumeer Ahmad Khan, Rohit Kumar, Narsis A. Kiani, David Gomez-Cabrero, Jesper N. Tegner

We introduce a new cross-modal fusion technique designed for generative error correction in automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

182

Paper
Code

Generative error correction for code-switching speech recognition using large language models

no code implementations • 17 Oct 2023 • Chen Chen, Yuchen Hu, Chao-Han Huck Yang, Hexin Liu, Sabato Marco Siniscalchi, Eng Siong Chng

In this work, we propose to leverage large language models (LLMs) and lists of hypotheses generated by an ASR to address the CS problem.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Conditional Modeling Based Automatic Video Summarization

no code implementations • 20 Nov 2023 • Jia-Hong Huang, Chao-Han Huck Yang, Pin-Yu Chen, Min-Hung Chen, Marcel Worring

The aim of video summarization is to shorten videos automatically while retaining the key information necessary to convey the overall story.

Video Summarization

Paper
Add Code

Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification

no code implementations • 22 Dec 2023 • Anirudh S. Sundar, Chao-Han Huck Yang, David M. Chan, Shalini Ghosh, Venkatesh Ravichandran, Phani Sankar Nidadavolu

In cases where some data/compute is available, we present Learnable-MAM, a data-driven approach to merging attention matrices, resulting in a further 2. 90% relative reduction in WER for ASR and 18. 42% relative reduction in AEC compared to fine-tuning.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Paralinguistics-Enhanced Large Language Modeling of Spoken Dialogue

no code implementations • 23 Dec 2023 • Guan-Ting Lin, Prashanth Gurunath Shivakumar, Ankur Gandhe, Chao-Han Huck Yang, Yile Gu, Shalini Ghosh, Andreas Stolcke, Hung-Yi Lee, Ivan Bulyko

Specifically, our framework serializes tasks in the order of current paralinguistic attribute prediction, response paralinguistic attribute prediction, and response text generation with autoregressive conditioning.

Attribute Language Modelling +4

Paper
Add Code

Investigating Training Strategies and Model Robustness of Low-Rank Adaptation for Language Modeling in Speech Recognition

no code implementations • 19 Jan 2024 • Yu Yu, Chao-Han Huck Yang, Tuan Dinh, Sungho Ryu, Jari Kolehmainen, Roger Ren, Denis Filimonov, Prashanth G. Shivakumar, Ankur Gandhe, Ariya Rastow, Jia Xu, Ivan Bulyko, Andreas Stolcke

The use of low-rank adaptation (LoRA) with frozen pretrained language models (PLMs) has become increasing popular as a mainstream, resource-efficient modeling approach for memory-constrained hardware.

Language Modelling speech-recognition +1

Paper
Add Code

Large Language Models are Efficient Learners of Noise-Robust Speech Recognition

1 code implementation • 19 Jan 2024 • Yuchen Hu, Chen Chen, Chao-Han Huck Yang, Ruizhe Li, Chao Zhang, Pin-Yu Chen, EnSiong Chng

To this end, we propose to extract a language-space noise embedding from the N-best list to represent the noise conditions of source speech, which can promote the denoising process in GER.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +6

Paper
Code

It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition

no code implementations • 8 Feb 2024 • Chen Chen, Ruizhe Li, Yuchen Hu, Sabato Marco Siniscalchi, Pin-Yu Chen, EnSiong Chng, Chao-Han Huck Yang

Recent studies have successfully shown that large language models (LLMs) can be successfully used for generative error correction (GER) on top of the automatic speech recognition (ASR) output.

Audio-Visual Speech Recognition Automatic Speech Recognition +3

Paper
Add Code

GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators

1 code implementation • 10 Feb 2024 • Yuchen Hu, Chen Chen, Chao-Han Huck Yang, Ruizhe Li, Dong Zhang, Zhehuai Chen, Eng Siong Chng

Leveraging the rich linguistic knowledge and strong reasoning abilities of LLMs, our new paradigm can integrate the rich information in N-best candidates to generate a higher-quality translation result.

Machine Translation Translation

144

Paper
Code

Bayesian Example Selection Improves In-Context Learning for Speech, Text, and Visual Modalities

no code implementations • 23 Apr 2024 • Siyin Wang, Chao-Han Huck Yang, Ji Wu, Chao Zhang

Large language models (LLMs) can adapt to new tasks through in-context learning (ICL) based on a few examples presented in dialogue history without any model parameter update.

In-Context Learning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.