Search Results for author: Chin-Hui Lee

Found 44 papers, 11 papers with code

A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition

1 code implementation • 7 Mar 2024 • Yusheng Dai, Hang Chen, Jun Du, Ruoyu Wang, Shihao Chen, Jiefeng Ma, Haotian Wang, Chin-Hui Lee

In this paper, we investigate this contrasting phenomenon from the perspective of modality bias and reveal that an excessive modality bias on the audio caused by dropout is the underlying reason.

Audio-Visual Speech Recognition Knowledge Distillation +2

Paper
Code

Bayesian adaptive learning to latent variables via Variational Bayes and Maximum a Posteriori

no code implementations • 24 Jan 2024 • Hu Hu, Sabato Marco Siniscalchi, Chin-Hui Lee

In this work, we aim to establish a Bayesian adaptive learning framework by focusing on estimating latent variables in deep neural network (DNN) models.

Acoustic Scene Classification Scene Classification +1

Paper
Add Code

Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding with Sequence-to-Sequence Architecture

1 code implementation • 17 Sep 2023 • Gaobin Yang, Maokui He, Shutong Niu, Ruoyu Wang, Yanyan Yue, Shuangqing Qian, Shilong Wu, Jun Du, Chin-Hui Lee

We propose a novel neural speaker diarization system using memory-aware multi-speaker embedding with sequence-to-sequence architecture (NSD-MS2S), which integrates the strengths of memory-aware multi-speaker embedding (MA-MSE) and sequence-to-sequence (Seq2Seq) architecture, leading to improvement in both efficiency and performance.

speaker-diarization Speaker Diarization

Paper
Code

Continuous Modeling of the Denoising Process for Speech Enhancement Based on Deep Learning

no code implementations • 17 Sep 2023 • Zilu Guo, Jun Du, Chin-Hui Lee

The starting state is noisy speech and the ending state is clean speech.

Automatic Speech Recognition Denoising +3

Paper
Add Code

Boosting End-to-End Multilingual Phoneme Recognition through Exploiting Universal Speech Attributes Constraints

no code implementations • 16 Sep 2023 • Hao Yen, Sabato Marco Siniscalchi, Chin-Hui Lee

The proposed joint multilingual model is evaluated through phoneme recognition.

Attribute Automatic Speech Recognition +2

Paper
Add Code

The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction

no code implementations • 15 Sep 2023 • Shilong Wu, Chenxi Wang, Hang Chen, Yusheng Dai, Chenyue Zhang, Ruoyu Wang, Hongbo Lan, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe, Sabato Marco Siniscalchi, Odette Scharenborg, Zhong-Qiu Wang, Jia Pan, Jianqing Gao

This pioneering effort aims to set the first benchmark for the AVTSE task, offering fresh insights into enhancing the ac-curacy of back-end speech recognition systems through AVTSE in challenging and real acoustic environments.

Audio-Visual Speech Recognition speech-recognition +2

Paper
Add Code

The USTC-NERCSLIP Systems for the CHiME-7 DASR Challenge

no code implementations • 28 Aug 2023 • Ruoyu Wang, Maokui He, Jun Du, Hengshun Zhou, Shutong Niu, Hang Chen, Yanyan Yue, Gaobin Yang, Shilong Wu, Lei Sun, Yanhui Tu, Haitao Tang, Shuangqing Qian, Tian Gao, Mengzhi Wang, Genshun Wan, Jia Pan, Jianqing Gao, Chin-Hui Lee

This technical report details our submission system to the CHiME-7 DASR Challenge, which focuses on speaker diarization and speech recognition under complex multi-speaker scenarios.

speaker-diarization Speaker Diarization +2

Paper
Add Code

Improving Audio-Visual Speech Recognition by Lip-Subword Correlation Based Visual Pre-training and Cross-Modal Fusion Encoder

1 code implementation • 14 Aug 2023 • Yusheng Dai, Hang Chen, Jun Du, Xiaofei Ding, Ning Ding, Feijun Jiang, Chin-Hui Lee

In this paper, we propose two novel techniques to improve audio-visual speech recognition (AVSR) under a pre-training and fine-tuning training framework.

Audio-Visual Speech Recognition Automatic Speech Recognition +2

Paper
Code

Semi-supervised multi-channel speaker diarization with cross-channel attention

no code implementations • 17 Jul 2023 • Shilong Wu, Jun Du, Maokui He, Shutong Niu, Hang Chen, Haitao Tang, Chin-Hui Lee

Most neural speaker diarization systems rely on sufficient manual training data labels, which are hard to collect under real-world scenarios.

speaker-diarization Speaker Diarization

Paper
Add Code

Variance-Preserving-Based Interpolation Diffusion Models for Speech Enhancement

1 code implementation • 14 Jun 2023 • Zilu Guo, Jun Du, Chin-Hui Lee, Yu Gao, Wenbin Zhang

The goal of this study is to implement diffusion models for speech enhancement (SE).

Speech Enhancement

Paper
Code

A Multi-dimensional Deep Structured State Space Approach to Speech Enhancement Using Small-footprint Models

1 code implementation • 1 Jun 2023 • Pin-Jui Ku, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee

We propose a multi-dimensional structured state space (S4) approach to speech enhancement.

Data Augmentation Speech Enhancement

Paper
Code

A Quantum Kernel Learning Approach to Acoustic Modeling for Spoken Command Recognition

no code implementations • 2 Nov 2022 • Chao-Han Huck Yang, Bo Li, Yu Zhang, Nanxin Chen, Tara N. Sainath, Sabato Marco Siniscalchi, Chin-Hui Lee

We propose a quantum kernel learning (QKL) framework to address the inherent data sparsity issues often encountered in training large-scare acoustic models in low-resource scenarios.

Spoken Command Recognition

Paper
Add Code

Deep Learning Based Audio-Visual Multi-Speaker DOA Estimation Using Permutation-Free Loss Function

no code implementations • 26 Oct 2022 • Qing Wang, Hang Chen, Ya Jiang, Zhe Wang, Yuyang Wang, Jun Du, Chin-Hui Lee

In this paper, we propose a deep learning based multi-speaker direction of arrival (DOA) estimation with audio and visual signals by using permutation-free loss function.

Paper
Add Code

An Ensemble Teacher-Student Learning Approach with Poisson Sub-sampling to Differential Privacy Preserving Speech Recognition

no code implementations • 12 Oct 2022 • Chao-Han Huck Yang, Jun Qi, Sabato Marco Siniscalchi, Chin-Hui Lee

We propose an ensemble learning framework with Poisson sub-sampling to effectively train a collection of teacher models to issue some differential privacy (DP) guarantee for training data.

Ensemble Learning Privacy Preserving +3

Paper
Add Code

An Experimental Study on Private Aggregation of Teacher Ensemble Learning for End-to-End Speech Recognition

no code implementations • 11 Oct 2022 • Chao-Han Huck Yang, I-Fan Chen, Andreas Stolcke, Sabato Marco Siniscalchi, Chin-Hui Lee

We evaluate three end-to-end deep models, including LAS, hybrid CTC/attention, and RNN transducer, on the open-source LibriSpeech and TIMIT corpora.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

A study on joint modeling and data augmentation of multi-modalities for audio-visual scene classification

no code implementations • 7 Mar 2022 • Qing Wang, Jun Du, Siyuan Zheng, Yunqing Li, Yajian Wang, Yuzhong Wu, Hu Hu, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Yannan Wang, Chin-Hui Lee

In this paper, we propose two techniques, namely joint modeling and data augmentation, to improve system performances for audio-visual scene classification (AVSC).

Data Augmentation Scene Classification

Paper
Add Code

A Study of Designing Compact Audio-Visual Wake Word Spotting System Based on Iterative Fine-Tuning in Neural Network Pruning

no code implementations • 17 Feb 2022 • Hengshun Zhou, Jun Du, Chao-Han Huck Yang, Shifu Xiong, Chin-Hui Lee

Audio-only-based wake word spotting (WWS) is challenging under noisy conditions due to environmental interference in signal transmission.

Network Pruning

Paper
Add Code

The USTC-Ximalaya system for the ICASSP 2022 multi-channel multi-party meeting transcription (M2MeT) challenge

no code implementations • 10 Feb 2022 • Maokui He, Xiang Lv, Weilin Zhou, JingJing Yin, Xiaoqi Zhang, Yuxuan Wang, Shutong Niu, Yuhang Cao, Heng Lu, Jun Du, Chin-Hui Lee

We propose two improvements to target-speaker voice activity detection (TS-VAD), the core component in our proposed speaker diarization system that was submitted to the 2022 Multi-Channel Multi-Party Meeting Transcription (M2MeT) challenge.

Action Detection Activity Detection +2

Paper
Add Code

A Variational Bayesian Approach to Learning Latent Variables for Acoustic Knowledge Transfer

1 code implementation • 16 Oct 2021 • Hu Hu, Sabato Marco Siniscalchi, Chao-Han Huck Yang, Chin-Hui Lee

We propose a variational Bayesian (VB) approach to learning distributions of latent variables in deep neural network (DNN) models for cross-domain knowledge transfer, to address acoustic mismatches between training and testing conditions.

Acoustic Scene Classification Scene Classification +1

Paper
Code

Separation Guided Speaker Diarization in Realistic Mismatched Conditions

no code implementations • 6 Jul 2021 • Shu-Tong Niu, Jun Du, Lei Sun, Chin-Hui Lee

We propose a separation guided speaker diarization (SGSD) approach by fully utilizing a complementarity of speech separation and speaker clustering.

Clustering speaker-diarization +2

Paper
Add Code

A Lottery Ticket Hypothesis Framework for Low-Complexity Device-Robust Neural Acoustic Scene Classification

no code implementations • 3 Jul 2021 • Hao Yen, Chao-Han Huck Yang, Hu Hu, Sabato Marco Siniscalchi, Qing Wang, Yuyang Wang, Xianjun Xia, Yuanjun Zhao, Yuzhong Wu, Yannan Wang, Jun Du, Chin-Hui Lee

We propose a novel neural model compression strategy combining data augmentation, knowledge transfer, pruning, and quantization for device-robust acoustic scene classification (ASC).

Acoustic Scene Classification Data Augmentation +5

Paper
Add Code

PATE-AAE: Incorporating Adversarial Autoencoder into Private Aggregation of Teacher Ensembles for Spoken Command Classification

no code implementations • 2 Apr 2021 • Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee

We propose using an adversarial autoencoder (AAE) to replace generative adversarial network (GAN) in the private aggregation of teacher ensembles (PATE), a solution for ensuring differential privacy in speech applications.

Ranked #3 on Keyword Spotting on Google Speech Commands (10-keyword Speech Commands dataset metric)

Generative Adversarial Network Keyword Spotting +1

Paper
Add Code

USTC-NELSLIP System Description for DIHARD-III Challenge

no code implementations • 19 Mar 2021 • Yuxuan Wang, Maokui He, Shutong Niu, Lei Sun, Tian Gao, Xin Fang, Jia Pan, Jun Du, Chin-Hui Lee

This system description describes our submission system to the Third DIHARD Speech Diarization Challenge.

Action Detection Activity Detection +3

Paper
Add Code

Lip-reading with Hierarchical Pyramidal Convolution and Self-Attention

no code implementations • 28 Dec 2020 • Hang Chen, Jun Du, Yu Hu, Li-Rong Dai, Chin-Hui Lee, Bao-Cai Yin

In this paper, we propose a novel deep learning architecture to improving word-level lip-reading.

Lip Reading

Paper
Add Code

A Two-Stage Approach to Device-Robust Acoustic Scene Classification

1 code implementation • 3 Nov 2020 • Hu Hu, Chao-Han Huck Yang, Xianjun Xia, Xue Bai, Xin Tang, Yajian Wang, Shutong Niu, Li Chai, Juanjuan Li, Hongning Zhu, Feng Bao, Yuanjun Zhao, Sabato Marco Siniscalchi, Yannan Wang, Jun Du, Chin-Hui Lee

To improve device robustness, a highly desirable key feature of a competitive data-driven acoustic scene classification (ASC) system, a novel two-stage system based on fully convolutional neural networks (CNNs) is proposed.

Ranked #1 on Acoustic Scene Classification on TAU Urban Acoustic Scenes 2019 (using extra training data)

Acoustic Scene Classification Classification +4

Paper
Code

Decentralizing Feature Extraction with Quantum Convolutional Neural Network for Automatic Speech Recognition

2 code implementations • 26 Oct 2020 • Chao-Han Huck Yang, Jun Qi, Samuel Yen-Chi Chen, Pin-Yu Chen, Sabato Marco Siniscalchi, Xiaoli Ma, Chin-Hui Lee

Testing on the Google Speech Commands Dataset, the proposed QCNN encoder attains a competitive accuracy of 95. 12% in a decentralized model, which is better than the previous architectures using centralized RNN models with convolutional features.

Ranked #1 on Keyword Spotting on Google Speech Commands (10-keyword Speech Commands dataset metric)

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Code

A Noise-Aware Memory-Attention Network Architecture for Regression-Based Speech Enhancement

no code implementations • 25 Oct 2020 • Yu-Xuan Wang, Jun Du, Li Chai, Chin-Hui Lee, Jia Pan

We propose a novel noise-aware memory-attention network (NAMAN) for regression-based speech enhancement, aiming at improving quality of enhanced speech in unseen noise conditions.

regression Speech Enhancement

Paper
Add Code

Correlating Subword Articulation with Lip Shapes for Embedding Aware Audio-Visual Speech Enhancement

no code implementations • 21 Sep 2020 • Hang Chen, Jun Du, Yu Hu, Li-Rong Dai, Bao-Cai Yin, Chin-Hui Lee

We first extract visual embedding from lip frames using a pre-trained phone or articulation place recognizer for visual-only EASE (VEASE).

Speech Enhancement

Paper
Add Code

On Mean Absolute Error for Deep Neural Network Based Vector-to-Vector Regression

no code implementations • 12 Aug 2020 • Jun Qi, Jun Du, Sabato Marco Siniscalchi, Xiaoli Ma, Chin-Hui Lee

In this paper, we exploit the properties of mean absolute error (MAE) as a loss function for the deep neural network (DNN) based vector-to-vector regression.

regression Speech Enhancement

Paper
Add Code

Analyzing Upper Bounds on Mean Absolute Errors for Deep Neural Network Based Vector-to-Vector Regression

no code implementations • 4 Aug 2020 • Jun Qi, Jun Du, Sabato Marco Siniscalchi, Xiaoli Ma, Chin-Hui Lee

In this paper, we show that, in vector-to-vector regression utilizing deep neural networks (DNNs), a generalized loss of mean absolute error (MAE) between the predicted and expected feature vectors is upper bounded by the sum of an approximation error, an estimation error, and an optimization error.

Learning Theory regression +2

Paper
Add Code

Relational Teacher Student Learning with Neural Label Embedding for Device Adaptation in Acoustic Scene Classification

no code implementations • 31 Jul 2020 • Hu Hu, Sabato Marco Siniscalchi, Yannan Wang, Chin-Hui Lee

In this paper, we propose a domain adaptation framework to address the device mismatch issue in acoustic scene classification leveraging upon neural label embedding (NLE) and relational teacher student learning (RTSL).

Acoustic Scene Classification Classification +3

Paper
Add Code

An Acoustic Segment Model Based Segment Unit Selection Approach to Acoustic Scene Classification with Partial Utterances

no code implementations • 31 Jul 2020 • Hu Hu, Sabato Marco Siniscalchi, Yannan Wang, Xue Bai, Jun Du, Chin-Hui Lee

In contrast to building scene models with whole utterances, the ASM-removed sub-utterances, i. e., acoustic utterances without stop acoustic segments, are then used as inputs to the AlexNet-L back-end for final classification.

Acoustic Scene Classification Classification +5

Paper
Add Code

Exploring Deep Hybrid Tensor-to-Vector Network Architectures for Regression Based Speech Enhancement

2 code implementations • 25 Jul 2020 • Jun Qi, Hu Hu, Yannan Wang, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee

Finally, our experiments of multi-channel speech enhancement on a simulated noisy WSJ0 corpus demonstrate that our proposed hybrid CNN-TT architecture achieves better results than both DNN and CNN models in terms of better-enhanced speech qualities and smaller parameter sizes.

regression Speech Enhancement

Paper
Code

Device-Robust Acoustic Scene Classification Based on Two-Stage Categorization and Data Augmentation

1 code implementation • 16 Jul 2020 • Hu Hu, Chao-Han Huck Yang, Xianjun Xia, Xue Bai, Xin Tang, Yajian Wang, Shutong Niu, Li Chai, Juanjuan Li, Hongning Zhu, Feng Bao, Yuanjun Zhao, Sabato Marco Siniscalchi, Yannan Wang, Jun Du, Chin-Hui Lee

On Task 1b development data set, we achieve an accuracy of 96. 7\% with a model size smaller than 500KB.

Acoustic Scene Classification Data Augmentation +3

Paper
Code

L-Vector: Neural Label Embedding for Domain Adaptation

no code implementations • 25 Apr 2020 • Zhong Meng, Hu Hu, Jinyu Li, Changliang Liu, Yan Huang, Yifan Gong, Chin-Hui Lee

We propose a novel neural label embedding (NLE) scheme for the domain adaptation of a deep neural network (DNN) acoustic model with unpaired data samples from source and target domains.

Domain Adaptation

Paper
Add Code

Characterizing Speech Adversarial Examples Using Self-Attention U-Net Enhancement

no code implementations • 31 Mar 2020 • Chao-Han Huck Yang, Jun Qi, Pin-Yu Chen, Xiaoli Ma, Chin-Hui Lee

Recent studies have highlighted adversarial examples as ubiquitous threats to the deep neural network (DNN) based speech recognition systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Enhanced Adversarial Strategically-Timed Attacks against Deep Reinforcement Learning

no code implementations • 20 Feb 2020 • Chao-Han Huck Yang, Jun Qi, Pin-Yu Chen, Yi Ouyang, I-Te Danny Hung, Chin-Hui Lee, Xiaoli Ma

Recent deep neural networks based techniques, especially those equipped with the ability of self-adaptation in the system level such as deep reinforcement learning (DRL), are shown to possess many advantages of optimizing robot learning systems (e. g., autonomous navigation and continuous robot arm control.)

Autonomous Navigation reinforcement-learning +1

Paper
Add Code

Tensor-to-Vector Regression for Multi-channel Speech Enhancement based on Tensor-Train Network

2 code implementations • 3 Feb 2020 • Jun Qi, Hu Hu, Yannan Wang, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee

Finally, in 8-channel conditions, a PESQ of 3. 12 is achieved using 20 million parameters for TTN, whereas a DNN with 68 million parameters can only attain a PESQ of 3. 06.

regression Speech Enhancement

Paper
Code

Riemannian Stochastic Gradient Descent for Tensor-Train Recurrent Neural Networks

no code implementations • ICLR 2019 • Jun Qi, Chin-Hui Lee, Javier Tejedor

The Tensor-Train factorization (TTF) is an efficient way to compress large weight matrices of fully-connected layers and recurrent layers in recurrent neural networks (RNNs).

Machine Translation Translation

Paper
Add Code

Convolutional-Recurrent Neural Networks for Speech Enhancement

no code implementations • 2 May 2018 • Han Zhao, Shuayb Zarar, Ivan Tashev, Chin-Hui Lee

By incorporating prior knowledge of speech signals into the design of model structures, we build a model that is more data-efficient and achieves better generalization on both seen and unseen noise.

Speech Enhancement

Paper
Add Code

Multi-Objective Learning and Mask-Based Post-Processing for Deep Neural Network Based Speech Enhancement

no code implementations • 21 Mar 2017 • Yong Xu, Jun Du, Zhen Huang, Li-Rong Dai, Chin-Hui Lee

We propose a multi-objective framework to learn both secondary targets not directly related to the intended task of speech enhancement (SE) and the primary target of the clean log-power spectra (LPS) features to be used directly for constructing the enhanced speech signals.

Sound

Paper
Add Code

Tweet Normalization with Syllables

no code implementations • IJCNLP 2015 • Ke Xu, Yunqing Xia, Chin-Hui Lee

Ranked #2 on Lexical Normalization on LexNorm

Lexical Normalization Machine Translation

Paper
Add Code

A Probabilistic Framework for Representing Dialog Systems and Entropy-Based Dialog Management through Dynamic Stochastic State Evolution

no code implementations • 27 Apr 2015 • Ji Wu, Miao Li, Chin-Hui Lee

A Song-On-Demand task, with a total of 38117 songs and 12 attributes corresponding to each song, is used to test the performance of the proposed approach.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Maximum a Posteriori Adaptation of Network Parameters in Deep Models

no code implementations • 6 Mar 2015 • Zhen Huang, Sabato Marco Siniscalchi, I-Fan Chen, Jiadong Wu, Chin-Hui Lee

We present a Bayesian approach to adapting parameters of a well-trained context-dependent, deep-neural-network, hidden Markov model (CD-DNN-HMM) to improve automatic speech recognition performance.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.