Search Results for author: Sabato Marco Siniscalchi

Found 15 papers, 6 papers with code

A Variational Bayesian Approach to Learning Latent Variables for Acoustic Knowledge Transfer

no code implementations16 Oct 2021 Hu Hu, Sabato Marco Siniscalchi, Chao-Han Huck Yang, Chin-Hui Lee

We propose a variational Bayesian (VB) approach to learning distributions of latent variables in deep neural network (DNN) models for cross-domain knowledge transfer, to address acoustic mismatches between training and testing conditions.

Acoustic Scene Classification Scene Classification +1

A Study of Low-Resource Speech Commands Recognition based on Adversarial Reprogramming

1 code implementation8 Oct 2021 Hao Yen, Pin-Jui Ku, Chao-Han Huck Yang, Hu Hu, Sabato Marco Siniscalchi, Pin-Yu Chen, Yu Tsao

In this study, we propose a novel adversarial reprogramming (AR) approach for low-resource spoken command recognition (SCR), and build an AR-SCR system.

Transfer Learning

Exploring Retraining-Free Speech Recognition for Intra-sentential Code-Switching

no code implementations27 Aug 2021 Zhen Huang, Xiaodan Zhuang, Daben Liu, Xiaoqiang Xiao, Yuchen Zhang, Sabato Marco Siniscalchi

To achieve such an ambitious goal, new mechanisms for foreign pronunciation generation and language model (LM) enrichment have been devised.

Speech Recognition

A Lottery Ticket Hypothesis Framework for Low-Complexity Device-Robust Neural Acoustic Scene Classification

no code implementations3 Jul 2021 Chao-Han Huck Yang, Hu Hu, Sabato Marco Siniscalchi, Qing Wang, Yuyang Wang, Xianjun Xia, Yuanjun Zhao, Yuzhong Wu, Yannan Wang, Jun Du, Chin-Hui Lee

We propose a novel neural model compression strategy combining data augmentation, knowledge transfer, pruning, and quantization for device-robust acoustic scene classification (ASC).

Acoustic Scene Classification Data Augmentation +5

PATE-AAE: Incorporating Adversarial Autoencoder into Private Aggregation of Teacher Ensembles for Spoken Command Classification

no code implementations2 Apr 2021 Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee

We propose using an adversarial autoencoder (AAE) to replace generative adversarial network (GAN) in the private aggregation of teacher ensembles (PATE), a solution for ensuring differential privacy in speech applications.

Ranked #3 on Keyword Spotting on Google Speech Commands (10-keyword Speech Commands dataset metric)

Keyword Spotting

A Two-Stage Approach to Device-Robust Acoustic Scene Classification

1 code implementation3 Nov 2020 Hu Hu, Chao-Han Huck Yang, Xianjun Xia, Xue Bai, Xin Tang, Yajian Wang, Shutong Niu, Li Chai, Juanjuan Li, Hongning Zhu, Feng Bao, Yuanjun Zhao, Sabato Marco Siniscalchi, Yannan Wang, Jun Du, Chin-Hui Lee

To improve device robustness, a highly desirable key feature of a competitive data-driven acoustic scene classification (ASC) system, a novel two-stage system based on fully convolutional neural networks (CNNs) is proposed.

Acoustic Scene Classification Data Augmentation +2

Decentralizing Feature Extraction with Quantum Convolutional Neural Network for Automatic Speech Recognition

2 code implementations26 Oct 2020 Chao-Han Huck Yang, Jun Qi, Samuel Yen-Chi Chen, Pin-Yu Chen, Sabato Marco Siniscalchi, Xiaoli Ma, Chin-Hui Lee

Testing on the Google Speech Commands Dataset, the proposed QCNN encoder attains a competitive accuracy of 95. 12% in a decentralized model, which is better than the previous architectures using centralized RNN models with convolutional features.

 Ranked #1 on Keyword Spotting on Google Speech Commands (10-keyword Speech Commands dataset metric)

Federated Learning Keyword Spotting +1

On Mean Absolute Error for Deep Neural Network Based Vector-to-Vector Regression

no code implementations12 Aug 2020 Jun Qi, Jun Du, Sabato Marco Siniscalchi, Xiaoli Ma, Chin-Hui Lee

In this paper, we exploit the properties of mean absolute error (MAE) as a loss function for the deep neural network (DNN) based vector-to-vector regression.

Speech Enhancement

Analyzing Upper Bounds on Mean Absolute Errors for Deep Neural Network Based Vector-to-Vector Regression

no code implementations4 Aug 2020 Jun Qi, Jun Du, Sabato Marco Siniscalchi, Xiaoli Ma, Chin-Hui Lee

In this paper, we show that, in vector-to-vector regression utilizing deep neural networks (DNNs), a generalized loss of mean absolute error (MAE) between the predicted and expected feature vectors is upper bounded by the sum of an approximation error, an estimation error, and an optimization error.

Learning Theory Speech Enhancement

Relational Teacher Student Learning with Neural Label Embedding for Device Adaptation in Acoustic Scene Classification

no code implementations31 Jul 2020 Hu Hu, Sabato Marco Siniscalchi, Yannan Wang, Chin-Hui Lee

In this paper, we propose a domain adaptation framework to address the device mismatch issue in acoustic scene classification leveraging upon neural label embedding (NLE) and relational teacher student learning (RTSL).

Acoustic Scene Classification Domain Adaptation +2

An Acoustic Segment Model Based Segment Unit Selection Approach to Acoustic Scene Classification with Partial Utterances

no code implementations31 Jul 2020 Hu Hu, Sabato Marco Siniscalchi, Yannan Wang, Xue Bai, Jun Du, Chin-Hui Lee

In contrast to building scene models with whole utterances, the ASM-removed sub-utterances, i. e., acoustic utterances without stop acoustic segments, are then used as inputs to the AlexNet-L back-end for final classification.

Acoustic Scene Classification Data Augmentation +3

Exploring Deep Hybrid Tensor-to-Vector Network Architectures for Regression Based Speech Enhancement

2 code implementations25 Jul 2020 Jun Qi, Hu Hu, Yannan Wang, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee

Finally, our experiments of multi-channel speech enhancement on a simulated noisy WSJ0 corpus demonstrate that our proposed hybrid CNN-TT architecture achieves better results than both DNN and CNN models in terms of better-enhanced speech qualities and smaller parameter sizes.

Speech Enhancement Speech Quality

Tensor-to-Vector Regression for Multi-channel Speech Enhancement based on Tensor-Train Network

2 code implementations3 Feb 2020 Jun Qi, Hu Hu, Yannan Wang, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee

Finally, in 8-channel conditions, a PESQ of 3. 12 is achieved using 20 million parameters for TTN, whereas a DNN with 68 million parameters can only attain a PESQ of 3. 06.

Speech Enhancement

Maximum a Posteriori Adaptation of Network Parameters in Deep Models

no code implementations6 Mar 2015 Zhen Huang, Sabato Marco Siniscalchi, I-Fan Chen, Jiadong Wu, Chin-Hui Lee

We present a Bayesian approach to adapting parameters of a well-trained context-dependent, deep-neural-network, hidden Markov model (CD-DNN-HMM) to improve automatic speech recognition performance.

Speech Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.