no code implementations • 27 Sep 2023 • Chen Chen, Yuchen Hu, Chao-Han Huck Yang, Sabato Macro Siniscalchi, Pin-Yu Chen, Eng Siong Chng
We make our results publicly accessible for reproducible pipelines with released pre-trained models, thus providing a new evaluation paradigm for ASR error correction with LLMs.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 27 Sep 2023 • Chao-Han Huck Yang, Yile Gu, Yi-Chieh Liu, Shalini Ghosh, Ivan Bulyko, Andreas Stolcke
We explore the ability of large language models (LLMs) to act as ASR post-processors that perform rescoring and error correction.
no code implementations • 26 Sep 2023 • Yu Yu, Chao-Han Huck Yang, Jari Kolehmainen, Prashanth G. Shivakumar, Yile Gu, Sungho Ryu, Roger Ren, Qi Luo, Aditya Gourav, I-Fan Chen, Yi-Chieh Liu, Tuan Dinh, Ankur Gandhe, Denis Filimonov, Shalini Ghosh, Andreas Stolcke, Ariya Rastow, Ivan Bulyko
We propose a neural language modeling system based on low-rank adaptation (LoRA) for speech recognition output rescoring.
no code implementations • 13 Sep 2023 • Siyin Wang, Chao-Han Huck Yang, Ji Wu, Chao Zhang
Language-level adaptation experiments using Chinese dialects showed that when applying SICL to isolated word ASR, consistent and considerable relative WER reductions can be achieved using Whisper models of any size on two dialects, which is on average 32. 3%.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 4 Jul 2023 • Jia-Hong Huang, Chao-Han Huck Yang, Pin-Yu Chen, Andrew Brown, Marcel Worring
Multi-modal video summarization has a video input and a text-based query input.
1 code implementation • 1 Jun 2023 • Zih-Ching Chen, Chao-Han Huck Yang, Bo Li, Yu Zhang, Nanxin Chen, Shou-Yiin Chang, Rohit Prabhavalkar, Hung-Yi Lee, Tara N. Sainath
In this work, we introduce a "score-based assessment" framework for estimating the transferability of pre-trained speech models (PSMs) for fine-tuning target tasks.
1 code implementation • 1 Jun 2023 • Pin-Jui Ku, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee
We propose a multi-dimensional structured state space (S4) approach to speech enhancement.
no code implementations • 26 May 2023 • Chen Chen, Chao-Han Huck Yang, Kai Li, Yuchen Hu, Pin-Jui Ku, Eng Siong Chng
In this work, we introduce S4M, a new efficient speech separation framework based on neural state-space models (SSM).
1 code implementation • 19 May 2023 • Chun-Wei Ho, Chao-Han Huck Yang, Sabato Marco Siniscalchi
Evaluated on the open-access Multilingual Spoken Words (MLSW) dataset, our solution reduces the number of trainable parameters by 97. 5% using the RAs with only a 4% performance drop with respect to fine-tuning the cross-lingual speech classifier while preserving DP guarantees.
no code implementations • 18 May 2023 • Jun Qi, Chao-Han Huck Yang, Pin-Yu Chen, Min-Hsiu Hsieh
Variational quantum circuit (VQC) is a promising approach for implementing quantum neural networks on noisy intermediate-scale quantum (NISQ) devices.
1 code implementation • 18 May 2023 • Srijith Radhakrishnan, Chao-Han Huck Yang, Sumeer Ahmad Khan, Narsis A. Kiani, David Gomez-Cabrero, Jesper N. Tegner
In this work, we explore Parameter-Efficient-Learning (PEL) techniques to repurpose a General-Purpose-Speech (GSM) model for Arabic dialect identification (ADI).
1 code implementation • 18 May 2023 • Li-Jen Yang, Chao-Han Huck Yang, Jen-Tzung Chien
This paper presents a parameter-efficient learning (PEL) to develop a low-resource accent adaptation for text-to-speech (TTS).
no code implementations • 30 Apr 2023 • Jia-Hong Huang, Chao-Han Huck Yang, Pin-Yu Chen, Min-Hung Chen, Marcel Worring
In this work, a Causal Explainer, dubbed Causalainer, is proposed to address this issue.
no code implementations • 19 Jan 2023 • Chao-Han Huck Yang, Bo Li, Yu Zhang, Nanxin Chen, Rohit Prabhavalkar, Tara N. Sainath, Trevor Strohman
In this work, we propose a new parameter-efficient learning framework based on neural model reprogramming for cross-lingual speech recognition, which can \textbf{re-purpose} well-trained English automatic speech recognition (ASR) models to recognize the other languages.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 2 Nov 2022 • Jhih-Cing Huang, Yu-Lin Tsai, Chao-Han Huck Yang, Cheng-Fang Su, Chia-Mu Yu, Pin-Yu Chen, Sy-Yen Kuo
Recently, quantum classifiers have been found to be vulnerable to adversarial attacks, in which quantum classifiers are deceived by imperceptible noises, leading to misclassification.
no code implementations • 2 Nov 2022 • Chao-Han Huck Yang, Bo Li, Yu Zhang, Nanxin Chen, Tara N. Sainath, Sabato Marco Siniscalchi, Chin-Hui Lee
We propose a quantum kernel learning (QKL) framework to address the inherent data sparsity issues often encountered in training large-scare acoustic models in low-resource scenarios.
1 code implementation • 2 Nov 2022 • Tsun-An Hsieh, Chao-Han Huck Yang, Pin-Yu Chen, Sabato Marco Siniscalchi, Yu Tsao
This study addresses the speech enhancement (SE) task within the causal inference paradigm by modeling the noise presence as an intervention.
1 code implementation • 2 Nov 2022 • Yun-Ning Hung, Chao-Han Huck Yang, Pin-Yu Chen, Alexander Lerch
In this work, we introduce a novel method for leveraging pre-trained models for low-resource (music) classification based on the concept of Neural Model Reprogramming (NMR).
no code implementations • 12 Oct 2022 • Chao-Han Huck Yang, Jun Qi, Sabato Marco Siniscalchi, Chin-Hui Lee
We propose an ensemble learning framework with Poisson sub-sampling to effectively train a collection of teacher models to issue some differential privacy (DP) guarantee for training data.
no code implementations • 11 Oct 2022 • Chao-Han Huck Yang, I-Fan Chen, Andreas Stolcke, Sabato Marco Siniscalchi, Chin-Hui Lee
We evaluate three end-to-end deep models, including LAS, hybrid CTC/attention, and RNN transducer, on the open-source LibriSpeech and TIMIT corpora.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
1 code implementation • 8 Jun 2022 • Jun Qi, Chao-Han Huck Yang, Pin-Yu Chen, Min-Hsiu Hsieh
In this work, we first put forth an end-to-end quantum neural network, TTN-VQC, which consists of a quantum tensor network based on a tensor-train network (TTN) for dimensionality reduction and a VQC for functional regression.
no code implementations • 29 Mar 2022 • Chao-Han Huck Yang, I-Te Danny Hung, Yi-Chieh Liu, Pin-Yu Chen
In this work, we incorporate this binary information of "existence of noise" as treatment into image classification tasks to improve prediction accuracy by jointly estimating their treatment effects.
1 code implementation • 11 Mar 2022 • Jun Qi, Chao-Han Huck Yang, Pin-Yu Chen, Javier Tejedor
This work focuses on designing low complexity hybrid tensor networks by considering trade-offs between the model complexity and practical performance.
no code implementations • 7 Mar 2022 • Qing Wang, Jun Du, Siyuan Zheng, Yunqing Li, Yajian Wang, Yuzhong Wu, Hu Hu, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Yannan Wang, Chin-Hui Lee
In this paper, we propose two techniques, namely joint modeling and data augmentation, to improve system performances for audio-visual scene classification (AVSC).
no code implementations • 17 Feb 2022 • Chao-Han Huck Yang, Jun Qi, Samuel Yen-Chi Chen, Yu Tsao, Pin-Yu Chen
Our experiments on intent classification show that our proposed BERT-QTC model attains competitive experimental results in the Snips and ATIS spoken language datasets.
no code implementations • 17 Feb 2022 • Chao-Han Huck Yang, Zeeshan Ahmed, Yile Gu, Joseph Szurley, Roger Ren, Linda Liu, Andreas Stolcke, Ivan Bulyko
In this work, we aim to enhance the system robustness of end-to-end automatic speech recognition (ASR) against adversarially-noisy speech examples.
no code implementations • 17 Feb 2022 • Hengshun Zhou, Jun Du, Chao-Han Huck Yang, Shifu Xiong, Chin-Hui Lee
Audio-only-based wake word spotting (WWS) is challenging under noisy conditions due to environmental interference in signal transmission.
no code implementations • 29 Nov 2021 • Chao-Han Huck Yang, Zhengling Qi, Yifan Cui, Pin-Yu Chen
Deep Reinforcement Learning (DRL) has demonstrated great potentials in solving sequential decision making problems in many applications.
1 code implementation • 16 Oct 2021 • Hu Hu, Sabato Marco Siniscalchi, Chao-Han Huck Yang, Chin-Hui Lee
We propose a variational Bayesian (VB) approach to learning distributions of latent variables in deep neural network (DNN) models for cross-domain knowledge transfer, to address acoustic mismatches between training and testing conditions.
1 code implementation • 8 Oct 2021 • Hao Yen, Pin-Jui Ku, Chao-Han Huck Yang, Hu Hu, Sabato Marco Siniscalchi, Pin-Yu Chen, Yu Tsao
In this study, we propose a novel adversarial reprogramming (AR) approach for low-resource spoken command recognition (SCR), and build an AR-SCR system.
no code implementations • 6 Oct 2021 • Jun Qi, Chao-Han Huck Yang, Pin-Yu Chen
The advent of noisy intermediate-scale quantum (NISQ) computers raises a crucial challenge to design quantum neural networks for fully quantum learning tasks.
no code implementations • 3 Jul 2021 • Hao Yen, Chao-Han Huck Yang, Hu Hu, Sabato Marco Siniscalchi, Qing Wang, Yuyang Wang, Xianjun Xia, Yuanjun Zhao, Yuzhong Wu, Yannan Wang, Jun Du, Chin-Hui Lee
We propose a novel neural model compression strategy combining data augmentation, knowledge transfer, pruning, and quantization for device-robust acoustic scene classification (ASC).
3 code implementations • 17 Jun 2021 • Chao-Han Huck Yang, Yun-Yun Tsai, Pin-Yu Chen
Learning to classify time series with limited data is a practical yet challenging problem.
no code implementations • 30 May 2021 • Jia-Hong Huang, Ting-Wei Wu, Chao-Han Huck Yang, Marcel Worring
Automatically generating medical reports for retinal images is one of the promising ways to help ophthalmologists reduce their workload and improve work efficiency.
no code implementations • 2 Apr 2021 • Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee
We propose using an adversarial autoencoder (AAE) to replace generative adversarial network (GAN) in the private aggregation of teacher ensembles (PATE), a solution for ensuring differential privacy in speech applications.
Ranked #3 on
Keyword Spotting
on Google Speech Commands
(10-keyword Speech Commands dataset metric)
1 code implementation • 18 Feb 2021 • Chao-Han Huck Yang, I-Te Danny Hung, Yi Ouyang, Pin-Yu Chen
Deep reinforcement learning (DRL) has demonstrated impressive performance in various gaming simulators and real-world applications.
no code implementations • 23 Nov 2020 • Chao-Han Huck Yang, Linda Liu, Ankur Gandhe, Yile Gu, Anirudh Raju, Denis Filimonov, Ivan Bulyko
We show that our rescoring model trained with these additional tasks outperforms the baseline rescoring model, trained with only the language modeling task, by 1. 4% on a general test and by 2. 6% on a rare word test set in terms of word-error-rate relative (WERR).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
1 code implementation • 3 Nov 2020 • Hu Hu, Chao-Han Huck Yang, Xianjun Xia, Xue Bai, Xin Tang, Yajian Wang, Shutong Niu, Li Chai, Juanjuan Li, Hongning Zhu, Feng Bao, Yuanjun Zhao, Sabato Marco Siniscalchi, Yannan Wang, Jun Du, Chin-Hui Lee
To improve device robustness, a highly desirable key feature of a competitive data-driven acoustic scene classification (ASC) system, a novel two-stage system based on fully convolutional neural networks (CNNs) is proposed.
Ranked #1 on
Acoustic Scene Classification
on TAU Urban Acoustic Scenes 2019
(using extra training data)
1 code implementation • 1 Nov 2020 • Jia-Hong Huang, Chao-Han Huck Yang, Fangyu Liu, Meng Tian, Yi-Chieh Liu, Ting-Wei Wu, I-Hung Lin, Kang Wang, Hiromasa Morikawa, Hernghua Chang, Jesper Tegner, Marcel Worring
To train and validate the effectiveness of our DNN-based module, we propose a large-scale retinal disease image dataset.
2 code implementations • 26 Oct 2020 • Chao-Han Huck Yang, Jun Qi, Samuel Yen-Chi Chen, Pin-Yu Chen, Sabato Marco Siniscalchi, Xiaoli Ma, Chin-Hui Lee
Testing on the Google Speech Commands Dataset, the proposed QCNN encoder attains a competitive accuracy of 95. 12% in a decentralized model, which is better than the previous architectures using centralized RNN models with convolutional features.
Ranked #1 on
Keyword Spotting
on Google Speech Commands
(10-keyword Speech Commands dataset metric)
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
2 code implementations • 25 Jul 2020 • Jun Qi, Hu Hu, Yannan Wang, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee
Finally, our experiments of multi-channel speech enhancement on a simulated noisy WSJ0 corpus demonstrate that our proposed hybrid CNN-TT architecture achieves better results than both DNN and CNN models in terms of better-enhanced speech qualities and smaller parameter sizes.
no code implementations • 17 Jul 2020 • Hao-Hsiang Yang, Chao-Han Huck Yang, Yu-Chiang Frank Wang
Wavelet transform and the inverse wavelet transform are substituted for down-sampling and up-sampling so feature maps from the wavelet transform and convolutions contain different frequencies and scales.
1 code implementation • 16 Jul 2020 • Hu Hu, Chao-Han Huck Yang, Xianjun Xia, Xue Bai, Xin Tang, Yajian Wang, Shutong Niu, Li Chai, Juanjuan Li, Hongning Zhu, Feng Bao, Yuanjun Zhao, Sabato Marco Siniscalchi, Yannan Wang, Jun Du, Chin-Hui Lee
On Task 1b development data set, we achieve an accuracy of 96. 7\% with a model size smaller than 500KB.
1 code implementation • 31 Mar 2020 • Hao-Hsiang Yang, Chao-Han Huck Yang, Yi-Chang James Tsai
Extensive experimental results demonstrate that the proposed Y-net with the W-SSIM loss function restores high-quality clear images and outperforms state-of-the-art algorithms.
no code implementations • 31 Mar 2020 • Chao-Han Huck Yang, Jun Qi, Pin-Yu Chen, Xiaoli Ma, Chin-Hui Lee
Recent studies have highlighted adversarial examples as ubiquitous threats to the deep neural network (DNN) based speech recognition systems.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 20 Feb 2020 • Chao-Han Huck Yang, Jun Qi, Pin-Yu Chen, Yi Ouyang, I-Te Danny Hung, Chin-Hui Lee, Xiaoli Ma
Recent deep neural networks based techniques, especially those equipped with the ability of self-adaptation in the system level such as deep reinforcement learning (DRL), are shown to possess many advantages of optimizing robot learning systems (e. g., autonomous navigation and continuous robot arm control.)
1 code implementation • 3 Feb 2020 • Haoling Zhang, Chao-Han Huck Yang, Hector Zenil, Narsis A. Kiani, Yue Shen, Jesper N. Tegner
Using RET, two types of approaches -- NEAT with Binary search encoding (Bi-NEAT) and NEAT with Golden-Section search encoding (GS-NEAT) -- have been designed to solve problems in benchmark continuous learning environments such as logic gates, Cartpole, and Lunar Lander, and tested against classical NEAT and FS-NEAT as baselines.
2 code implementations • 3 Feb 2020 • Jun Qi, Hu Hu, Yannan Wang, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee
Finally, in 8-channel conditions, a PESQ of 3. 12 is achieved using 20 million parameters for TTN, whereas a DNN with 68 million parameters can only attain a PESQ of 3. 06.
1 code implementation • 27 Jan 2020 • Jun Qi, Chao-Han Huck Yang, Javier Tejedor
Distributed automatic speech recognition (ASR) requires to aggregate outputs of distributed deep neural network (DNN)-based models.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 6 Nov 2019 • Yi-Chieh Liu, Yung-An Hsieh, Min-Hung Chen, Chao-Han Huck Yang, Jesper Tegner, Yi-Chang James Tsai
Performing driving behaviors based on causal reasoning is essential to ensure driving safety.
2 code implementations • 13 Aug 2019 • Sheng-Chun Kao, Chao-Han Huck Yang, Pin-Yu Chen, Xiaoli Ma, Tushar Krishna
In this work, we demonstrate the promise of applying reinforcement learning (RL) to optimize NoC runtime performance.
1 code implementation • 30 Jun 2019 • Samuel Yen-Chi Chen, Chao-Han Huck Yang, Jun Qi, Pin-Yu Chen, Xiaoli Ma, Hsi-Sheng Goan
To the best of our knowledge, this work is the first proof-of-principle demonstration of variational quantum circuits to approximate the deep $Q$-value function for decision-making and policy-selection reinforcement learning with experience replay and target network.
1 code implementation • 11 Feb 2019 • Yi-Chieh Liu, Hao-Hsiang Yang, Chao-Han Huck Yang, Jia-Hong Huang, Meng Tian, Hiromasa Morikawa, Yi-Chang James Tsai, Jesper Tegner
Age-Related Macular Degeneration (AMD) is an asymptomatic retinal disease which may result in loss of vision.
1 code implementation • 9 Feb 2019 • Chao-Han Huck Yang, Yi-Chieh Liu, Pin-Yu Chen, Xiaoli Ma, Yi-Chang James Tsai
To study the intervention effects on pixel-level features for causal reasoning, we introduce pixel-wise masking and adversarial perturbation.
1 code implementation • 14 Nov 2018 • Rise Ooi, Chao-Han Huck Yang, Pin-Yu Chen, Vìctor Eguìluz, Narsis Kiani, Hector Zenil, David Gomez-Cabrero, Jesper Tegnèr
Next, (2) the learned networks are technically controllable as only a small number of driver nodes are required to move the system to a new state.