1 code implementation • 14 Nov 2018 • Rise Ooi, Chao-Han Huck Yang, Pin-Yu Chen, Vìctor Eguìluz, Narsis Kiani, Hector Zenil, David Gomez-Cabrero, Jesper Tegnèr
Next, (2) the learned networks are technically controllable as only a small number of driver nodes are required to move the system to a new state.
1 code implementation • 9 Feb 2019 • Chao-Han Huck Yang, Yi-Chieh Liu, Pin-Yu Chen, Xiaoli Ma, Yi-Chang James Tsai
To study the intervention effects on pixel-level features for causal reasoning, we introduce pixel-wise masking and adversarial perturbation.
1 code implementation • 11 Feb 2019 • Yi-Chieh Liu, Hao-Hsiang Yang, Chao-Han Huck Yang, Jia-Hong Huang, Meng Tian, Hiromasa Morikawa, Yi-Chang James Tsai, Jesper Tegner
Age-Related Macular Degeneration (AMD) is an asymptomatic retinal disease which may result in loss of vision.
1 code implementation • 30 Jun 2019 • Samuel Yen-Chi Chen, Chao-Han Huck Yang, Jun Qi, Pin-Yu Chen, Xiaoli Ma, Hsi-Sheng Goan
To the best of our knowledge, this work is the first proof-of-principle demonstration of variational quantum circuits to approximate the deep $Q$-value function for decision-making and policy-selection reinforcement learning with experience replay and target network.
2 code implementations • 13 Aug 2019 • Sheng-Chun Kao, Chao-Han Huck Yang, Pin-Yu Chen, Xiaoli Ma, Tushar Krishna
In this work, we demonstrate the promise of applying reinforcement learning (RL) to optimize NoC runtime performance.
no code implementations • 6 Nov 2019 • Yi-Chieh Liu, Yung-An Hsieh, Min-Hung Chen, Chao-Han Huck Yang, Jesper Tegner, Yi-Chang James Tsai
Performing driving behaviors based on causal reasoning is essential to ensure driving safety.
1 code implementation • 27 Jan 2020 • Jun Qi, Chao-Han Huck Yang, Javier Tejedor
Distributed automatic speech recognition (ASR) requires to aggregate outputs of distributed deep neural network (DNN)-based models.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
2 code implementations • 3 Feb 2020 • Jun Qi, Hu Hu, Yannan Wang, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee
Finally, in 8-channel conditions, a PESQ of 3. 12 is achieved using 20 million parameters for TTN, whereas a DNN with 68 million parameters can only attain a PESQ of 3. 06.
1 code implementation • 3 Feb 2020 • Haoling Zhang, Chao-Han Huck Yang, Hector Zenil, Narsis A. Kiani, Yue Shen, Jesper N. Tegner
Using RET, two types of approaches -- NEAT with Binary search encoding (Bi-NEAT) and NEAT with Golden-Section search encoding (GS-NEAT) -- have been designed to solve problems in benchmark continuous learning environments such as logic gates, Cartpole, and Lunar Lander, and tested against classical NEAT and FS-NEAT as baselines.
no code implementations • 20 Feb 2020 • Chao-Han Huck Yang, Jun Qi, Pin-Yu Chen, Yi Ouyang, I-Te Danny Hung, Chin-Hui Lee, Xiaoli Ma
Recent deep neural networks based techniques, especially those equipped with the ability of self-adaptation in the system level such as deep reinforcement learning (DRL), are shown to possess many advantages of optimizing robot learning systems (e. g., autonomous navigation and continuous robot arm control.)
1 code implementation • 31 Mar 2020 • Hao-Hsiang Yang, Chao-Han Huck Yang, Yi-Chang James Tsai
Extensive experimental results demonstrate that the proposed Y-net with the W-SSIM loss function restores high-quality clear images and outperforms state-of-the-art algorithms.
no code implementations • 31 Mar 2020 • Chao-Han Huck Yang, Jun Qi, Pin-Yu Chen, Xiaoli Ma, Chin-Hui Lee
Recent studies have highlighted adversarial examples as ubiquitous threats to the deep neural network (DNN) based speech recognition systems.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • 16 Jul 2020 • Hu Hu, Chao-Han Huck Yang, Xianjun Xia, Xue Bai, Xin Tang, Yajian Wang, Shutong Niu, Li Chai, Juanjuan Li, Hongning Zhu, Feng Bao, Yuanjun Zhao, Sabato Marco Siniscalchi, Yannan Wang, Jun Du, Chin-Hui Lee
On Task 1b development data set, we achieve an accuracy of 96. 7\% with a model size smaller than 500KB.
no code implementations • 17 Jul 2020 • Hao-Hsiang Yang, Chao-Han Huck Yang, Yu-Chiang Frank Wang
Wavelet transform and the inverse wavelet transform are substituted for down-sampling and up-sampling so feature maps from the wavelet transform and convolutions contain different frequencies and scales.
2 code implementations • 25 Jul 2020 • Jun Qi, Hu Hu, Yannan Wang, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee
Finally, our experiments of multi-channel speech enhancement on a simulated noisy WSJ0 corpus demonstrate that our proposed hybrid CNN-TT architecture achieves better results than both DNN and CNN models in terms of better-enhanced speech qualities and smaller parameter sizes.
2 code implementations • 26 Oct 2020 • Chao-Han Huck Yang, Jun Qi, Samuel Yen-Chi Chen, Pin-Yu Chen, Sabato Marco Siniscalchi, Xiaoli Ma, Chin-Hui Lee
Testing on the Google Speech Commands Dataset, the proposed QCNN encoder attains a competitive accuracy of 95. 12% in a decentralized model, which is better than the previous architectures using centralized RNN models with convolutional features.
Ranked #1 on Keyword Spotting on Google Speech Commands (10-keyword Speech Commands dataset metric)
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • 1 Nov 2020 • Jia-Hong Huang, Chao-Han Huck Yang, Fangyu Liu, Meng Tian, Yi-Chieh Liu, Ting-Wei Wu, I-Hung Lin, Kang Wang, Hiromasa Morikawa, Hernghua Chang, Jesper Tegner, Marcel Worring
To train and validate the effectiveness of our DNN-based module, we propose a large-scale retinal disease image dataset.
1 code implementation • 3 Nov 2020 • Hu Hu, Chao-Han Huck Yang, Xianjun Xia, Xue Bai, Xin Tang, Yajian Wang, Shutong Niu, Li Chai, Juanjuan Li, Hongning Zhu, Feng Bao, Yuanjun Zhao, Sabato Marco Siniscalchi, Yannan Wang, Jun Du, Chin-Hui Lee
To improve device robustness, a highly desirable key feature of a competitive data-driven acoustic scene classification (ASC) system, a novel two-stage system based on fully convolutional neural networks (CNNs) is proposed.
Ranked #1 on Acoustic Scene Classification on TAU Urban Acoustic Scenes 2019 (using extra training data)
no code implementations • 23 Nov 2020 • Chao-Han Huck Yang, Linda Liu, Ankur Gandhe, Yile Gu, Anirudh Raju, Denis Filimonov, Ivan Bulyko
We show that our rescoring model trained with these additional tasks outperforms the baseline rescoring model, trained with only the language modeling task, by 1. 4% on a general test and by 2. 6% on a rare word test set in terms of word-error-rate relative (WERR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • 18 Feb 2021 • Chao-Han Huck Yang, I-Te Danny Hung, Yi Ouyang, Pin-Yu Chen
Deep reinforcement learning (DRL) has demonstrated impressive performance in various gaming simulators and real-world applications.
no code implementations • 2 Apr 2021 • Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee
We propose using an adversarial autoencoder (AAE) to replace generative adversarial network (GAN) in the private aggregation of teacher ensembles (PATE), a solution for ensuring differential privacy in speech applications.
Ranked #3 on Keyword Spotting on Google Speech Commands (10-keyword Speech Commands dataset metric)
no code implementations • 30 May 2021 • Jia-Hong Huang, Ting-Wei Wu, Chao-Han Huck Yang, Marcel Worring
Automatically generating medical reports for retinal images is one of the promising ways to help ophthalmologists reduce their workload and improve work efficiency.
3 code implementations • 17 Jun 2021 • Chao-Han Huck Yang, Yun-Yun Tsai, Pin-Yu Chen
Learning to classify time series with limited data is a practical yet challenging problem.
no code implementations • 3 Jul 2021 • Hao Yen, Chao-Han Huck Yang, Hu Hu, Sabato Marco Siniscalchi, Qing Wang, Yuyang Wang, Xianjun Xia, Yuanjun Zhao, Yuzhong Wu, Yannan Wang, Jun Du, Chin-Hui Lee
We propose a novel neural model compression strategy combining data augmentation, knowledge transfer, pruning, and quantization for device-robust acoustic scene classification (ASC).
no code implementations • 6 Oct 2021 • Jun Qi, Chao-Han Huck Yang, Pin-Yu Chen
The advent of noisy intermediate-scale quantum (NISQ) computers raises a crucial challenge to design quantum neural networks for fully quantum learning tasks.
1 code implementation • 8 Oct 2021 • Hao Yen, Pin-Jui Ku, Chao-Han Huck Yang, Hu Hu, Sabato Marco Siniscalchi, Pin-Yu Chen, Yu Tsao
In this study, we propose a novel adversarial reprogramming (AR) approach for low-resource spoken command recognition (SCR), and build an AR-SCR system.
1 code implementation • 16 Oct 2021 • Hu Hu, Sabato Marco Siniscalchi, Chao-Han Huck Yang, Chin-Hui Lee
We propose a variational Bayesian (VB) approach to learning distributions of latent variables in deep neural network (DNN) models for cross-domain knowledge transfer, to address acoustic mismatches between training and testing conditions.
no code implementations • 29 Nov 2021 • Chao-Han Huck Yang, Zhengling Qi, Yifan Cui, Pin-Yu Chen
Deep Reinforcement Learning (DRL) has demonstrated great potentials in solving sequential decision making problems in many applications.
no code implementations • 17 Feb 2022 • Chao-Han Huck Yang, Jun Qi, Samuel Yen-Chi Chen, Yu Tsao, Pin-Yu Chen
Our experiments on intent classification show that our proposed BERT-QTC model attains competitive experimental results in the Snips and ATIS spoken language datasets.
no code implementations • 17 Feb 2022 • Hengshun Zhou, Jun Du, Chao-Han Huck Yang, Shifu Xiong, Chin-Hui Lee
Audio-only-based wake word spotting (WWS) is challenging under noisy conditions due to environmental interference in signal transmission.
no code implementations • 17 Feb 2022 • Chao-Han Huck Yang, Zeeshan Ahmed, Yile Gu, Joseph Szurley, Roger Ren, Linda Liu, Andreas Stolcke, Ivan Bulyko
In this work, we aim to enhance the system robustness of end-to-end automatic speech recognition (ASR) against adversarially-noisy speech examples.
no code implementations • 7 Mar 2022 • Qing Wang, Jun Du, Siyuan Zheng, Yunqing Li, Yajian Wang, Yuzhong Wu, Hu Hu, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Yannan Wang, Chin-Hui Lee
In this paper, we propose two techniques, namely joint modeling and data augmentation, to improve system performances for audio-visual scene classification (AVSC).
1 code implementation • 11 Mar 2022 • Jun Qi, Chao-Han Huck Yang, Pin-Yu Chen, Javier Tejedor
This work focuses on designing low complexity hybrid tensor networks by considering trade-offs between the model complexity and practical performance.
no code implementations • 29 Mar 2022 • Chao-Han Huck Yang, I-Te Danny Hung, Yi-Chieh Liu, Pin-Yu Chen
In this work, we incorporate this binary information of "existence of noise" as treatment into image classification tasks to improve prediction accuracy by jointly estimating their treatment effects.
1 code implementation • 8 Jun 2022 • Jun Qi, Chao-Han Huck Yang, Pin-Yu Chen, Min-Hsiu Hsieh
In this work, we first put forth an end-to-end quantum neural network, TTN-VQC, which consists of a quantum tensor network based on a tensor-train network (TTN) for dimensionality reduction and a VQC for functional regression.
no code implementations • 11 Oct 2022 • Chao-Han Huck Yang, I-Fan Chen, Andreas Stolcke, Sabato Marco Siniscalchi, Chin-Hui Lee
We evaluate three end-to-end deep models, including LAS, hybrid CTC/attention, and RNN transducer, on the open-source LibriSpeech and TIMIT corpora.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 12 Oct 2022 • Chao-Han Huck Yang, Jun Qi, Sabato Marco Siniscalchi, Chin-Hui Lee
We propose an ensemble learning framework with Poisson sub-sampling to effectively train a collection of teacher models to issue some differential privacy (DP) guarantee for training data.
no code implementations • 2 Nov 2022 • Chao-Han Huck Yang, Bo Li, Yu Zhang, Nanxin Chen, Tara N. Sainath, Sabato Marco Siniscalchi, Chin-Hui Lee
We propose a quantum kernel learning (QKL) framework to address the inherent data sparsity issues often encountered in training large-scare acoustic models in low-resource scenarios.
no code implementations • 2 Nov 2022 • Jhih-Cing Huang, Yu-Lin Tsai, Chao-Han Huck Yang, Cheng-Fang Su, Chia-Mu Yu, Pin-Yu Chen, Sy-Yen Kuo
Recently, quantum classifiers have been found to be vulnerable to adversarial attacks, in which quantum classifiers are deceived by imperceptible noises, leading to misclassification.
1 code implementation • 2 Nov 2022 • Yun-Ning Hung, Chao-Han Huck Yang, Pin-Yu Chen, Alexander Lerch
In this work, we introduce a novel method for leveraging pre-trained models for low-resource (music) classification based on the concept of Neural Model Reprogramming (NMR).
1 code implementation • 2 Nov 2022 • Tsun-An Hsieh, Chao-Han Huck Yang, Pin-Yu Chen, Sabato Marco Siniscalchi, Yu Tsao
This study addresses the speech enhancement (SE) task within the causal inference paradigm by modeling the noise presence as an intervention.
no code implementations • 19 Jan 2023 • Chao-Han Huck Yang, Bo Li, Yu Zhang, Nanxin Chen, Rohit Prabhavalkar, Tara N. Sainath, Trevor Strohman
In this work, we propose a new parameter-efficient learning framework based on neural model reprogramming for cross-lingual speech recognition, which can \textbf{re-purpose} well-trained English automatic speech recognition (ASR) models to recognize the other languages.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 30 Apr 2023 • Jia-Hong Huang, Chao-Han Huck Yang, Pin-Yu Chen, Min-Hung Chen, Marcel Worring
In this work, a Causal Explainer, dubbed Causalainer, is proposed to address this issue.
no code implementations • 18 May 2023 • Jun Qi, Chao-Han Huck Yang, Pin-Yu Chen, Min-Hsiu Hsieh
Variational quantum circuit (VQC) is a promising approach for implementing quantum neural networks on noisy intermediate-scale quantum (NISQ) devices.
1 code implementation • 18 May 2023 • Li-Jen Yang, Chao-Han Huck Yang, Jen-Tzung Chien
This paper presents a parameter-efficient learning (PEL) to develop a low-resource accent adaptation for text-to-speech (TTS).
1 code implementation • 18 May 2023 • Srijith Radhakrishnan, Chao-Han Huck Yang, Sumeer Ahmad Khan, Narsis A. Kiani, David Gomez-Cabrero, Jesper N. Tegner
In this work, we explore Parameter-Efficient-Learning (PEL) techniques to repurpose a General-Purpose-Speech (GSM) model for Arabic dialect identification (ADI).
1 code implementation • 19 May 2023 • Chun-Wei Ho, Chao-Han Huck Yang, Sabato Marco Siniscalchi
Evaluated on the open-access Multilingual Spoken Words (MLSW) dataset, our solution reduces the number of trainable parameters by 97. 5% using the RAs with only a 4% performance drop with respect to fine-tuning the cross-lingual speech classifier while preserving DP guarantees.
1 code implementation • 26 May 2023 • Chen Chen, Chao-Han Huck Yang, Kai Li, Yuchen Hu, Pin-Jui Ku, Eng Siong Chng
In this work, we introduce S4M, a new efficient speech separation framework based on neural state-space models (SSM).
1 code implementation • 1 Jun 2023 • Pin-Jui Ku, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee
We propose a multi-dimensional structured state space (S4) approach to speech enhancement.
1 code implementation • 1 Jun 2023 • Zih-Ching Chen, Chao-Han Huck Yang, Bo Li, Yu Zhang, Nanxin Chen, Shuo-Yiin Chang, Rohit Prabhavalkar, Hung-Yi Lee, Tara N. Sainath
In this work, we introduce a "score-based assessment" framework for estimating the transferability of pre-trained speech models (PSMs) for fine-tuning target tasks.
no code implementations • 4 Jul 2023 • Jia-Hong Huang, Chao-Han Huck Yang, Pin-Yu Chen, Andrew Brown, Marcel Worring
Multi-modal video summarization has a video input and a text-based query input.
no code implementations • 13 Sep 2023 • Siyin Wang, Chao-Han Huck Yang, Ji Wu, Chao Zhang
Language-level adaptation experiments using Chinese dialects showed that when applying SICL to isolated word ASR, consistent and considerable relative WER reductions can be achieved using Whisper models of any size on two dialects, which is on average 32. 3%.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 26 Sep 2023 • Yu Yu, Chao-Han Huck Yang, Jari Kolehmainen, Prashanth G. Shivakumar, Yile Gu, Sungho Ryu, Roger Ren, Qi Luo, Aditya Gourav, I-Fan Chen, Yi-Chieh Liu, Tuan Dinh, Ankur Gandhe, Denis Filimonov, Shalini Ghosh, Andreas Stolcke, Ariya Rastow, Ivan Bulyko
We propose a neural language modeling system based on low-rank adaptation (LoRA) for speech recognition output rescoring.
no code implementations • 27 Sep 2023 • Chao-Han Huck Yang, Yile Gu, Yi-Chieh Liu, Shalini Ghosh, Ivan Bulyko, Andreas Stolcke
We explore the ability of large language models (LLMs) to act as speech recognition post-processors that perform rescoring and error correction.
Ranked #3 on Speech Recognition on WSJ eval92 (using extra training data)
1 code implementation • NeurIPS 2023 • Chen Chen, Yuchen Hu, Chao-Han Huck Yang, Sabato Macro Siniscalchi, Pin-Yu Chen, Eng Siong Chng
We make our results publicly accessible for reproducible pipelines with released pre-trained models, thus providing a new evaluation paradigm for ASR error correction with LLMs.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • 10 Oct 2023 • Srijith Radhakrishnan, Chao-Han Huck Yang, Sumeer Ahmad Khan, Rohit Kumar, Narsis A. Kiani, David Gomez-Cabrero, Jesper N. Tegner
We introduce a new cross-modal fusion technique designed for generative error correction in automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 17 Oct 2023 • Chen Chen, Yuchen Hu, Chao-Han Huck Yang, Hexin Liu, Sabato Marco Siniscalchi, Eng Siong Chng
In this work, we propose to leverage large language models (LLMs) and lists of hypotheses generated by an ASR to address the CS problem.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 20 Nov 2023 • Jia-Hong Huang, Chao-Han Huck Yang, Pin-Yu Chen, Min-Hung Chen, Marcel Worring
The aim of video summarization is to shorten videos automatically while retaining the key information necessary to convey the overall story.
no code implementations • 22 Dec 2023 • Anirudh S. Sundar, Chao-Han Huck Yang, David M. Chan, Shalini Ghosh, Venkatesh Ravichandran, Phani Sankar Nidadavolu
In cases where some data/compute is available, we present Learnable-MAM, a data-driven approach to merging attention matrices, resulting in a further 2. 90% relative reduction in WER for ASR and 18. 42% relative reduction in AEC compared to fine-tuning.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 23 Dec 2023 • Guan-Ting Lin, Prashanth Gurunath Shivakumar, Ankur Gandhe, Chao-Han Huck Yang, Yile Gu, Shalini Ghosh, Andreas Stolcke, Hung-Yi Lee, Ivan Bulyko
Specifically, our framework serializes tasks in the order of current paralinguistic attribute prediction, response paralinguistic attribute prediction, and response text generation with autoregressive conditioning.
no code implementations • 19 Jan 2024 • Yu Yu, Chao-Han Huck Yang, Tuan Dinh, Sungho Ryu, Jari Kolehmainen, Roger Ren, Denis Filimonov, Prashanth G. Shivakumar, Ankur Gandhe, Ariya Rastow, Jia Xu, Ivan Bulyko, Andreas Stolcke
The use of low-rank adaptation (LoRA) with frozen pretrained language models (PLMs) has become increasing popular as a mainstream, resource-efficient modeling approach for memory-constrained hardware.
1 code implementation • 19 Jan 2024 • Yuchen Hu, Chen Chen, Chao-Han Huck Yang, Ruizhe Li, Chao Zhang, Pin-Yu Chen, EnSiong Chng
To this end, we propose to extract a language-space noise embedding from the N-best list to represent the noise conditions of source speech, which can promote the denoising process in GER.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +6
no code implementations • 8 Feb 2024 • Chen Chen, Ruizhe Li, Yuchen Hu, Sabato Marco Siniscalchi, Pin-Yu Chen, EnSiong Chng, Chao-Han Huck Yang
Recent studies have successfully shown that large language models (LLMs) can be successfully used for generative error correction (GER) on top of the automatic speech recognition (ASR) output.
Audio-Visual Speech Recognition Automatic Speech Recognition +3
1 code implementation • 10 Feb 2024 • Yuchen Hu, Chen Chen, Chao-Han Huck Yang, Ruizhe Li, Dong Zhang, Zhehuai Chen, Eng Siong Chng
Leveraging the rich linguistic knowledge and strong reasoning abilities of LLMs, our new paradigm can integrate the rich information in N-best candidates to generate a higher-quality translation result.
no code implementations • 23 Apr 2024 • Siyin Wang, Chao-Han Huck Yang, Ji Wu, Chao Zhang
Large language models (LLMs) can adapt to new tasks through in-context learning (ICL) based on a few examples presented in dialogue history without any model parameter update.