Search Results for author: Sabato Marco Siniscalchi

Found 35 papers, 16 papers with code

Exploiting Foundation Models and Speech Enhancement for Parkinson's Disease Detection from Speech in Real-World Operative Conditions

1 code implementation23 Jun 2024 Moreno La Quatra, Maria Francesca Turco, Torbjørn Svendsen, Giampiero Salvi, Juan Rafael Orozco-Arroyave, Sabato Marco Siniscalchi

This work is concerned with devising a robust Parkinson's (PD) disease detector from speech in real-world operating conditions using (i) foundational models, and (ii) speech enhancement (SE) methods.

Speech Enhancement

Speech Analysis of Language Varieties in Italy

1 code implementation22 Jun 2024 Moreno La Quatra, Alkis Koudounas, Elena Baralis, Sabato Marco Siniscalchi

We leverage self-supervised learning models to tackle this task and analyze differences and similarities between Italy's regional languages.

Contrastive Learning Self-Supervised Learning

Language-Universal Speech Attributes Modeling for Zero-Shot Multilingual Spoken Keyword Recognition

no code implementations4 Jun 2024 Hao Yen, Pin-Jui Ku, Sabato Marco Siniscalchi, Chin-Hui Lee

We propose a novel language-universal approach to end-to-end automatic spoken keyword recognition (SKR) leveraging upon (i) a self-supervised pre-trained model, and (ii) a set of universal speech attributes (manner and place of articulation).

Attribute

Benchmarking Representations for Speech, Music, and Acoustic Events

1 code implementation2 May 2024 Moreno La Quatra, Alkis Koudounas, Lorenzo Vaiani, Elena Baralis, Luca Cagliero, Paolo Garza, Sabato Marco Siniscalchi

Limited diversity in standardized benchmarks for evaluating audio representation learning (ARL) methods may hinder systematic comparison of current methods' capabilities.

Audio Classification Benchmarking +2

It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition

1 code implementation8 Feb 2024 Chen Chen, Ruizhe Li, Yuchen Hu, Sabato Marco Siniscalchi, Pin-Yu Chen, EnSiong Chng, Chao-Han Huck Yang

Recent studies have successfully shown that large language models (LLMs) can be successfully used for generative error correction (GER) on top of the automatic speech recognition (ASR) output.

Ranked #4 on Speech Recognition on WSJ eval92 (using extra training data)

Audio-Visual Speech Recognition Automatic Speech Recognition +3

Bayesian adaptive learning to latent variables via Variational Bayes and Maximum a Posteriori

no code implementations24 Jan 2024 Hu Hu, Sabato Marco Siniscalchi, Chin-Hui Lee

In this work, we aim to establish a Bayesian adaptive learning framework by focusing on estimating latent variables in deep neural network (DNN) models.

Acoustic Scene Classification Scene Classification +1

The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction

no code implementations15 Sep 2023 Shilong Wu, Chenxi Wang, Hang Chen, Yusheng Dai, Chenyue Zhang, Ruoyu Wang, Hongbo Lan, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe, Sabato Marco Siniscalchi, Odette Scharenborg, Zhong-Qiu Wang, Jia Pan, Jianqing Gao

This pioneering effort aims to set the first benchmark for the AVTSE task, offering fresh insights into enhancing the ac-curacy of back-end speech recognition systems through AVTSE in challenging and real acoustic environments.

Audio-Visual Speech Recognition speech-recognition +2

S-HR-VQVAE: Sequential Hierarchical Residual Learning Vector Quantized Variational Autoencoder for Video Prediction

no code implementations13 Jul 2023 Mohammad Adiban, Kalin Stefanov, Sabato Marco Siniscalchi, Giampiero Salvi

We address the video prediction task by putting forth a novel model that combines (i) our recently proposed hierarchical residual vector quantized variational autoencoder (HR-VQVAE), and (ii) a novel spatiotemporal PixelCNN (ST-PixelCNN).

Video Prediction

How word semantics and phonology affect handwriting of Alzheimer's patients: a machine learning based analysis

no code implementations6 Jul 2023 Nicole Dalia Cilia, Claudio De Stefano, Francesco Fontanella, Sabato Marco Siniscalchi

Using kinematic properties of handwriting to support the diagnosis of neurodegenerative disease is a real challenge: non-invasive detection techniques combined with machine learning approaches promise big steps forward in this research field.

feature selection

Differentially Private Adapters for Parameter Efficient Acoustic Modeling

1 code implementation19 May 2023 Chun-Wei Ho, Chao-Han Huck Yang, Sabato Marco Siniscalchi

Evaluated on the open-access Multilingual Spoken Words (MLSW) dataset, our solution reduces the number of trainable parameters by 97. 5% using the RAs with only a 4% performance drop with respect to fine-tuning the cross-lingual speech classifier while preserving DP guarantees.

Inference and Denoise: Causal Inference-based Neural Speech Enhancement

1 code implementation2 Nov 2022 Tsun-An Hsieh, Chao-Han Huck Yang, Pin-Yu Chen, Sabato Marco Siniscalchi, Yu Tsao

This study addresses the speech enhancement (SE) task within the causal inference paradigm by modeling the noise presence as an intervention.

Causal Inference Speech Enhancement

A Quantum Kernel Learning Approach to Acoustic Modeling for Spoken Command Recognition

no code implementations2 Nov 2022 Chao-Han Huck Yang, Bo Li, Yu Zhang, Nanxin Chen, Tara N. Sainath, Sabato Marco Siniscalchi, Chin-Hui Lee

We propose a quantum kernel learning (QKL) framework to address the inherent data sparsity issues often encountered in training large-scare acoustic models in low-resource scenarios.

Spoken Command Recognition

An Ensemble Teacher-Student Learning Approach with Poisson Sub-sampling to Differential Privacy Preserving Speech Recognition

no code implementations12 Oct 2022 Chao-Han Huck Yang, Jun Qi, Sabato Marco Siniscalchi, Chin-Hui Lee

We propose an ensemble learning framework with Poisson sub-sampling to effectively train a collection of teacher models to issue some differential privacy (DP) guarantee for training data.

Ensemble Learning Privacy Preserving +3

A Variational Bayesian Approach to Learning Latent Variables for Acoustic Knowledge Transfer

1 code implementation16 Oct 2021 Hu Hu, Sabato Marco Siniscalchi, Chao-Han Huck Yang, Chin-Hui Lee

We propose a variational Bayesian (VB) approach to learning distributions of latent variables in deep neural network (DNN) models for cross-domain knowledge transfer, to address acoustic mismatches between training and testing conditions.

Acoustic Scene Classification Scene Classification +1

Neural Model Reprogramming with Similarity Based Mapping for Low-Resource Spoken Command Recognition

1 code implementation8 Oct 2021 Hao Yen, Pin-Jui Ku, Chao-Han Huck Yang, Hu Hu, Sabato Marco Siniscalchi, Pin-Yu Chen, Yu Tsao

In this study, we propose a novel adversarial reprogramming (AR) approach for low-resource spoken command recognition (SCR), and build an AR-SCR system.

Spoken Command Recognition Transfer Learning

Exploring Retraining-Free Speech Recognition for Intra-sentential Code-Switching

no code implementations27 Aug 2021 Zhen Huang, Xiaodan Zhuang, Daben Liu, Xiaoqiang Xiao, Yuchen Zhang, Sabato Marco Siniscalchi

To achieve such an ambitious goal, new mechanisms for foreign pronunciation generation and language model (LM) enrichment have been devised.

Language Modelling speech-recognition +1

PATE-AAE: Incorporating Adversarial Autoencoder into Private Aggregation of Teacher Ensembles for Spoken Command Classification

no code implementations2 Apr 2021 Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee

We propose using an adversarial autoencoder (AAE) to replace generative adversarial network (GAN) in the private aggregation of teacher ensembles (PATE), a solution for ensuring differential privacy in speech applications.

Ranked #3 on Keyword Spotting on Google Speech Commands (10-keyword Speech Commands dataset metric)

Generative Adversarial Network Keyword Spotting +1

A Two-Stage Approach to Device-Robust Acoustic Scene Classification

1 code implementation3 Nov 2020 Hu Hu, Chao-Han Huck Yang, Xianjun Xia, Xue Bai, Xin Tang, Yajian Wang, Shutong Niu, Li Chai, Juanjuan Li, Hongning Zhu, Feng Bao, Yuanjun Zhao, Sabato Marco Siniscalchi, Yannan Wang, Jun Du, Chin-Hui Lee

To improve device robustness, a highly desirable key feature of a competitive data-driven acoustic scene classification (ASC) system, a novel two-stage system based on fully convolutional neural networks (CNNs) is proposed.

Acoustic Scene Classification Classification +4

Decentralizing Feature Extraction with Quantum Convolutional Neural Network for Automatic Speech Recognition

2 code implementations26 Oct 2020 Chao-Han Huck Yang, Jun Qi, Samuel Yen-Chi Chen, Pin-Yu Chen, Sabato Marco Siniscalchi, Xiaoli Ma, Chin-Hui Lee

Testing on the Google Speech Commands Dataset, the proposed QCNN encoder attains a competitive accuracy of 95. 12% in a decentralized model, which is better than the previous architectures using centralized RNN models with convolutional features.

 Ranked #1 on Keyword Spotting on Google Speech Commands (10-keyword Speech Commands dataset metric)

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

On Mean Absolute Error for Deep Neural Network Based Vector-to-Vector Regression

no code implementations12 Aug 2020 Jun Qi, Jun Du, Sabato Marco Siniscalchi, Xiaoli Ma, Chin-Hui Lee

In this paper, we exploit the properties of mean absolute error (MAE) as a loss function for the deep neural network (DNN) based vector-to-vector regression.

regression Speech Enhancement

Analyzing Upper Bounds on Mean Absolute Errors for Deep Neural Network Based Vector-to-Vector Regression

no code implementations4 Aug 2020 Jun Qi, Jun Du, Sabato Marco Siniscalchi, Xiaoli Ma, Chin-Hui Lee

In this paper, we show that, in vector-to-vector regression utilizing deep neural networks (DNNs), a generalized loss of mean absolute error (MAE) between the predicted and expected feature vectors is upper bounded by the sum of an approximation error, an estimation error, and an optimization error.

Learning Theory regression +2

Relational Teacher Student Learning with Neural Label Embedding for Device Adaptation in Acoustic Scene Classification

no code implementations31 Jul 2020 Hu Hu, Sabato Marco Siniscalchi, Yannan Wang, Chin-Hui Lee

In this paper, we propose a domain adaptation framework to address the device mismatch issue in acoustic scene classification leveraging upon neural label embedding (NLE) and relational teacher student learning (RTSL).

Acoustic Scene Classification Classification +3

An Acoustic Segment Model Based Segment Unit Selection Approach to Acoustic Scene Classification with Partial Utterances

no code implementations31 Jul 2020 Hu Hu, Sabato Marco Siniscalchi, Yannan Wang, Xue Bai, Jun Du, Chin-Hui Lee

In contrast to building scene models with whole utterances, the ASM-removed sub-utterances, i. e., acoustic utterances without stop acoustic segments, are then used as inputs to the AlexNet-L back-end for final classification.

Acoustic Scene Classification Classification +5

Exploring Deep Hybrid Tensor-to-Vector Network Architectures for Regression Based Speech Enhancement

2 code implementations25 Jul 2020 Jun Qi, Hu Hu, Yannan Wang, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee

Finally, our experiments of multi-channel speech enhancement on a simulated noisy WSJ0 corpus demonstrate that our proposed hybrid CNN-TT architecture achieves better results than both DNN and CNN models in terms of better-enhanced speech qualities and smaller parameter sizes.

regression Speech Enhancement

Tensor-to-Vector Regression for Multi-channel Speech Enhancement based on Tensor-Train Network

2 code implementations3 Feb 2020 Jun Qi, Hu Hu, Yannan Wang, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee

Finally, in 8-channel conditions, a PESQ of 3. 12 is achieved using 20 million parameters for TTN, whereas a DNN with 68 million parameters can only attain a PESQ of 3. 06.

regression Speech Enhancement

Maximum a Posteriori Adaptation of Network Parameters in Deep Models

no code implementations6 Mar 2015 Zhen Huang, Sabato Marco Siniscalchi, I-Fan Chen, Jiadong Wu, Chin-Hui Lee

We present a Bayesian approach to adapting parameters of a well-trained context-dependent, deep-neural-network, hidden Markov model (CD-DNN-HMM) to improve automatic speech recognition performance.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Cannot find the paper you are looking for? You can Submit a new open access paper.