1 code implementation • 23 Jun 2024 • Moreno La Quatra, Maria Francesca Turco, Torbjørn Svendsen, Giampiero Salvi, Juan Rafael Orozco-Arroyave, Sabato Marco Siniscalchi
This work is concerned with devising a robust Parkinson's (PD) disease detector from speech in real-world operating conditions using (i) foundational models, and (ii) speech enhancement (SE) methods.
1 code implementation • 22 Jun 2024 • Moreno La Quatra, Alkis Koudounas, Elena Baralis, Sabato Marco Siniscalchi
We leverage self-supervised learning models to tackle this task and analyze differences and similarities between Italy's regional languages.
no code implementations • 4 Jun 2024 • Hao Yen, Pin-Jui Ku, Sabato Marco Siniscalchi, Chin-Hui Lee
We propose a novel language-universal approach to end-to-end automatic spoken keyword recognition (SKR) leveraging upon (i) a self-supervised pre-trained model, and (ii) a set of universal speech attributes (manner and place of articulation).
1 code implementation • 10 May 2024 • Rong Chao, Wen-Huang Cheng, Moreno La Quatra, Sabato Marco Siniscalchi, Chao-Han Huck Yang, Szu-Wei Fu, Yu Tsao
This work aims to study a scalable state-space model (SSM), Mamba, for the speech enhancement (SE) task.
Ranked #2 on Speech Enhancement on VoiceBank + DEMAND
1 code implementation • 2 May 2024 • Moreno La Quatra, Alkis Koudounas, Lorenzo Vaiani, Elena Baralis, Luca Cagliero, Paolo Garza, Sabato Marco Siniscalchi
Limited diversity in standardized benchmarks for evaluating audio representation learning (ARL) methods may hinder systematic comparison of current methods' capabilities.
1 code implementation • 8 Feb 2024 • Chen Chen, Ruizhe Li, Yuchen Hu, Sabato Marco Siniscalchi, Pin-Yu Chen, EnSiong Chng, Chao-Han Huck Yang
Recent studies have successfully shown that large language models (LLMs) can be successfully used for generative error correction (GER) on top of the automatic speech recognition (ASR) output.
Ranked #4 on Speech Recognition on WSJ eval92 (using extra training data)
Audio-Visual Speech Recognition Automatic Speech Recognition +3
no code implementations • 24 Jan 2024 • Hu Hu, Sabato Marco Siniscalchi, Chin-Hui Lee
In this work, we aim to establish a Bayesian adaptive learning framework by focusing on estimating latent variables in deep neural network (DNN) models.
no code implementations • 17 Oct 2023 • Chen Chen, Yuchen Hu, Chao-Han Huck Yang, Hexin Liu, Sabato Marco Siniscalchi, Eng Siong Chng
In this work, we propose to leverage large language models (LLMs) and lists of hypotheses generated by an ASR to address the CS problem.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 16 Sep 2023 • Hao Yen, Sabato Marco Siniscalchi, Chin-Hui Lee
The proposed joint multilingual model is evaluated through phoneme recognition.
no code implementations • 15 Sep 2023 • Shilong Wu, Chenxi Wang, Hang Chen, Yusheng Dai, Chenyue Zhang, Ruoyu Wang, Hongbo Lan, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe, Sabato Marco Siniscalchi, Odette Scharenborg, Zhong-Qiu Wang, Jia Pan, Jianqing Gao
This pioneering effort aims to set the first benchmark for the AVTSE task, offering fresh insights into enhancing the ac-curacy of back-end speech recognition systems through AVTSE in challenging and real acoustic environments.
no code implementations • 13 Jul 2023 • Mohammad Adiban, Kalin Stefanov, Sabato Marco Siniscalchi, Giampiero Salvi
We address the video prediction task by putting forth a novel model that combines (i) our recently proposed hierarchical residual vector quantized variational autoencoder (HR-VQVAE), and (ii) a novel spatiotemporal PixelCNN (ST-PixelCNN).
no code implementations • 6 Jul 2023 • Nicole Dalia Cilia, Claudio De Stefano, Francesco Fontanella, Sabato Marco Siniscalchi
Using kinematic properties of handwriting to support the diagnosis of neurodegenerative disease is a real challenge: non-invasive detection techniques combined with machine learning approaches promise big steps forward in this research field.
1 code implementation • 1 Jun 2023 • Pin-Jui Ku, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee
We propose a multi-dimensional structured state space (S4) approach to speech enhancement.
1 code implementation • 19 May 2023 • Chun-Wei Ho, Chao-Han Huck Yang, Sabato Marco Siniscalchi
Evaluated on the open-access Multilingual Spoken Words (MLSW) dataset, our solution reduces the number of trainable parameters by 97. 5% using the RAs with only a 4% performance drop with respect to fine-tuning the cross-lingual speech classifier while preserving DP guarantees.
1 code implementation • 2 Nov 2022 • Tsun-An Hsieh, Chao-Han Huck Yang, Pin-Yu Chen, Sabato Marco Siniscalchi, Yu Tsao
This study addresses the speech enhancement (SE) task within the causal inference paradigm by modeling the noise presence as an intervention.
no code implementations • 2 Nov 2022 • Chao-Han Huck Yang, Bo Li, Yu Zhang, Nanxin Chen, Tara N. Sainath, Sabato Marco Siniscalchi, Chin-Hui Lee
We propose a quantum kernel learning (QKL) framework to address the inherent data sparsity issues often encountered in training large-scare acoustic models in low-resource scenarios.
no code implementations • 12 Oct 2022 • Chao-Han Huck Yang, Jun Qi, Sabato Marco Siniscalchi, Chin-Hui Lee
We propose an ensemble learning framework with Poisson sub-sampling to effectively train a collection of teacher models to issue some differential privacy (DP) guarantee for training data.
no code implementations • 11 Oct 2022 • Chao-Han Huck Yang, I-Fan Chen, Andreas Stolcke, Sabato Marco Siniscalchi, Chin-Hui Lee
We evaluate three end-to-end deep models, including LAS, hybrid CTC/attention, and RNN transducer, on the open-source LibriSpeech and TIMIT corpora.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 9 Aug 2022 • Mohammad Adiban, Kalin Stefanov, Sabato Marco Siniscalchi, Giampiero Salvi
We propose a multi-layer variational autoencoder method, we call HR-VQVAE, that learns hierarchical discrete representations of the data.
no code implementations • 7 Mar 2022 • Qing Wang, Jun Du, Siyuan Zheng, Yunqing Li, Yajian Wang, Yuzhong Wu, Hu Hu, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Yannan Wang, Chin-Hui Lee
In this paper, we propose two techniques, namely joint modeling and data augmentation, to improve system performances for audio-visual scene classification (AVSC).
1 code implementation • 16 Oct 2021 • Hu Hu, Sabato Marco Siniscalchi, Chao-Han Huck Yang, Chin-Hui Lee
We propose a variational Bayesian (VB) approach to learning distributions of latent variables in deep neural network (DNN) models for cross-domain knowledge transfer, to address acoustic mismatches between training and testing conditions.
1 code implementation • 8 Oct 2021 • Hao Yen, Pin-Jui Ku, Chao-Han Huck Yang, Hu Hu, Sabato Marco Siniscalchi, Pin-Yu Chen, Yu Tsao
In this study, we propose a novel adversarial reprogramming (AR) approach for low-resource spoken command recognition (SCR), and build an AR-SCR system.
no code implementations • 27 Aug 2021 • Zhen Huang, Xiaodan Zhuang, Daben Liu, Xiaoqiang Xiao, Yuchen Zhang, Sabato Marco Siniscalchi
To achieve such an ambitious goal, new mechanisms for foreign pronunciation generation and language model (LM) enrichment have been devised.
no code implementations • 3 Jul 2021 • Hao Yen, Chao-Han Huck Yang, Hu Hu, Sabato Marco Siniscalchi, Qing Wang, Yuyang Wang, Xianjun Xia, Yuanjun Zhao, Yuzhong Wu, Yannan Wang, Jun Du, Chin-Hui Lee
We propose a novel neural model compression strategy combining data augmentation, knowledge transfer, pruning, and quantization for device-robust acoustic scene classification (ASC).
no code implementations • 2 Apr 2021 • Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee
We propose using an adversarial autoencoder (AAE) to replace generative adversarial network (GAN) in the private aggregation of teacher ensembles (PATE), a solution for ensuring differential privacy in speech applications.
Ranked #3 on Keyword Spotting on Google Speech Commands (10-keyword Speech Commands dataset metric)
1 code implementation • 3 Nov 2020 • Hu Hu, Chao-Han Huck Yang, Xianjun Xia, Xue Bai, Xin Tang, Yajian Wang, Shutong Niu, Li Chai, Juanjuan Li, Hongning Zhu, Feng Bao, Yuanjun Zhao, Sabato Marco Siniscalchi, Yannan Wang, Jun Du, Chin-Hui Lee
To improve device robustness, a highly desirable key feature of a competitive data-driven acoustic scene classification (ASC) system, a novel two-stage system based on fully convolutional neural networks (CNNs) is proposed.
Ranked #1 on Acoustic Scene Classification on TAU Urban Acoustic Scenes 2019 (using extra training data)
2 code implementations • 26 Oct 2020 • Chao-Han Huck Yang, Jun Qi, Samuel Yen-Chi Chen, Pin-Yu Chen, Sabato Marco Siniscalchi, Xiaoli Ma, Chin-Hui Lee
Testing on the Google Speech Commands Dataset, the proposed QCNN encoder attains a competitive accuracy of 95. 12% in a decentralized model, which is better than the previous architectures using centralized RNN models with convolutional features.
Ranked #1 on Keyword Spotting on Google Speech Commands (10-keyword Speech Commands dataset metric)
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 12 Aug 2020 • Jun Qi, Jun Du, Sabato Marco Siniscalchi, Xiaoli Ma, Chin-Hui Lee
In this paper, we exploit the properties of mean absolute error (MAE) as a loss function for the deep neural network (DNN) based vector-to-vector regression.
no code implementations • 4 Aug 2020 • Jun Qi, Jun Du, Sabato Marco Siniscalchi, Xiaoli Ma, Chin-Hui Lee
In this paper, we show that, in vector-to-vector regression utilizing deep neural networks (DNNs), a generalized loss of mean absolute error (MAE) between the predicted and expected feature vectors is upper bounded by the sum of an approximation error, an estimation error, and an optimization error.
no code implementations • 31 Jul 2020 • Hu Hu, Sabato Marco Siniscalchi, Yannan Wang, Chin-Hui Lee
In this paper, we propose a domain adaptation framework to address the device mismatch issue in acoustic scene classification leveraging upon neural label embedding (NLE) and relational teacher student learning (RTSL).
no code implementations • 31 Jul 2020 • Hu Hu, Sabato Marco Siniscalchi, Yannan Wang, Xue Bai, Jun Du, Chin-Hui Lee
In contrast to building scene models with whole utterances, the ASM-removed sub-utterances, i. e., acoustic utterances without stop acoustic segments, are then used as inputs to the AlexNet-L back-end for final classification.
2 code implementations • 25 Jul 2020 • Jun Qi, Hu Hu, Yannan Wang, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee
Finally, our experiments of multi-channel speech enhancement on a simulated noisy WSJ0 corpus demonstrate that our proposed hybrid CNN-TT architecture achieves better results than both DNN and CNN models in terms of better-enhanced speech qualities and smaller parameter sizes.
1 code implementation • 16 Jul 2020 • Hu Hu, Chao-Han Huck Yang, Xianjun Xia, Xue Bai, Xin Tang, Yajian Wang, Shutong Niu, Li Chai, Juanjuan Li, Hongning Zhu, Feng Bao, Yuanjun Zhao, Sabato Marco Siniscalchi, Yannan Wang, Jun Du, Chin-Hui Lee
On Task 1b development data set, we achieve an accuracy of 96. 7\% with a model size smaller than 500KB.
2 code implementations • 3 Feb 2020 • Jun Qi, Hu Hu, Yannan Wang, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee
Finally, in 8-channel conditions, a PESQ of 3. 12 is achieved using 20 million parameters for TTN, whereas a DNN with 68 million parameters can only attain a PESQ of 3. 06.
no code implementations • 6 Mar 2015 • Zhen Huang, Sabato Marco Siniscalchi, I-Fan Chen, Jiadong Wu, Chin-Hui Lee
We present a Bayesian approach to adapting parameters of a well-trained context-dependent, deep-neural-network, hidden Markov model (CD-DNN-HMM) to improve automatic speech recognition performance.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1