Search Results for author: Xiao-Lei Zhang

Found 38 papers, 6 papers with code

Eliminating Quantization Errors in Classification-Based Sound Source Localization

1 code implementation • 21 Nov 2023 • Linfeng Feng, Xiao-Lei Zhang, Xuelong Li

To address this, we propose an Unbiased Label Distribution (ULD) to eliminate quantization error in training targets.

Paper
Code

Diffusion-Based Adversarial Purification for Speaker Verification

no code implementations • 22 Oct 2023 • Yibo Bai, Xiao-Lei Zhang

Recently, automatic speaker verification (ASV) based on deep learning is easily contaminated by adversarial attacks, which is a new type of attack that injects imperceptible perturbations to audio signals so as to make ASV produce wrong decisions.

Denoising Speaker Verification

Paper
Add Code

Soft Label Coding for End-to-end Sound Source Localization With Ad-hoc Microphone Arrays

no code implementations • 15 Apr 2023 • Linfeng Feng, Yijun Gong, Xiao-Lei Zhang

The core idea is to take the geometric connection between the classes into the label coding process. The first one is named static soft label coding (SSLC), which modifies the one-hot codes into soft codes based on the distances between the local areas.

Quantization

Paper
Add Code

Optimizing Quantum Federated Learning Based on Federated Quantum Natural Gradient Descent

no code implementations • 27 Feb 2023 • Jun Qi, Xiao-Lei Zhang, Javier Tejedor

In this work, we propose an efficient optimization algorithm, namely federated quantum natural gradient descent (FQNGD), and further, apply it to a QFL framework that is composed of a variational quantum circuit (VQC)-based quantum neural networks (QNN).

Federated Learning

Paper
Add Code

Interpretable Spectrum Transformation Attacks to Speaker Recognition

no code implementations • 21 Feb 2023 • Jiadi Yao, Hong Luo, Xiao-Lei Zhang

Unlike existing approaches that operate voices in the time domain, the proposed framework operates voices in the time-frequency domain, which improves the interpretability, transferability, and imperceptibility of the attack.

Speaker Recognition

Paper
Add Code

LMD: A Learnable Mask Network to Detect Adversarial Examples for Speaker Verification

no code implementations • 2 Nov 2022 • Xing Chen, Jie Wang, Xiao-Lei Zhang, Wei-Qiang Zhang, Kunde Yang

It utilizes score variation as an indicator to detect adversarial examples, where the score variation is the absolute discrepancy between the ASV scores of an original audio recording and its transformed audio synthesized from its masked complex spectrogram.

Speaker Verification

Paper
Add Code

WeKws: A production first small-footprint end-to-end Keyword Spotting Toolkit

1 code implementation • 30 Oct 2022 • Jie Wang, Menglong Xu, Jingyong Hou, BinBin Zhang, Xiao-Lei Zhang, Lei Xie, Fuping Pan

Keyword spotting (KWS) enables speech-based user interaction and gradually becomes an indispensable component of smart devices.

Keyword Spotting

379

Paper
Code

Symmetric Saliency-based Adversarial Attack To Speaker Identification

no code implementations • 30 Oct 2022 • Jiadi Yao, Xing Chen, Xiao-Lei Zhang, Wei-Qiang Zhang, Kunde Yang

Adversarial attack approaches to speaker identification either need high computational cost or are not very effective, to our knowledge.

Adversarial Attack Speaker Identification

Paper
Add Code

Deep Learning Based Stage-wise Two-dimensional Speaker Localization with Large Ad-hoc Microphone Arrays

no code implementations • 19 Oct 2022 • Shupei Liu, Linfeng Feng, Yijun Gong, Chengdong Liang, Chen Zhang, Xiao-Lei Zhang, Xuelong Li

To further boost the estimation accuracy, we introduce a node selection algorithm that strategically filters the most reliable nodes.

Paper
Add Code

End-to-end Two-dimensional Sound Source Localization With Ad-hoc Microphone Arrays

no code implementations • 16 Oct 2022 • Yijun Gong, Shupei Liu, Xiao-Lei Zhang

Accordingly, the sound source localization problem is formulated as such a classification task of recognizing the one-hot code of the speaker given the one hot codes of the microphone nodes and their speech recordings.

Paper
Add Code

Improving Pseudo Labels With Intra-Class Similarity for Unsupervised Domain Adaptation

1 code implementation • 25 Jul 2022 • Jie Wang, Xiao-Lei Zhang

In this paper, we propose a novel approach to improve the accuracy of the pseudo labels in the target domain.

Unsupervised Domain Adaptation

Paper
Code

Conformer-based End-to-end Speech Recognition With Rotary Position Embedding

no code implementations • 13 Jul 2021 • Shengqiang Li, Menglong Xu, Xiao-Lei Zhang

To make use of the time order of the input sequence, many works inject some information about the relative or absolute position of the element into the input sequence.

Position speech-recognition +1

Paper
Add Code

AUC Optimization for Robust Small-footprint Keyword Spotting with Limited Training Data

no code implementations • 13 Jul 2021 • Menglong Xu, Shengqiang Li, Chengdong Liang, Xiao-Lei Zhang

Deep neural networks provide effective solutions to small-footprint keyword spotting (KWS).

Small-Footprint Keyword Spotting

Paper
Add Code

fMBN-E: Efficient Unsupervised Network Structure Ensemble and Selection for Clustering

no code implementations • 5 Jul 2021 • Xiao-Lei Zhang

Empirically, comparing to a number of advanced deep clustering methods and as many as 20 representative unsupervised ensemble learning and selection methods, the proposed methods reach the state-of-the-art performance without manual hyperparameter tuning.

Clustering Deep Clustering +4

Paper
Add Code

Scaling sparsemax based channel selection for speech recognition with ad-hoc microphone arrays

no code implementations • 29 Mar 2021 • Junqi Chen, Xiao-Lei Zhang

Because Sparsemax punishes the weights of many channels to zero harshly, we propose Scaling Sparsemax which punishes the channels mildly by setting the weights of very noisy channels to zero only.

speech-recognition Speech Recognition

Paper
Add Code

Transformer-based end-to-end speech recognition with residual Gaussian-based self-attention

no code implementations • 29 Mar 2021 • Chengdong Liang, Menglong Xu, Xiao-Lei Zhang

Although the performance of the proposed resGSA-Transformer is only slightly better than that of the RPSA-Transformer, it does not have to tune the window length manually.

speech-recognition Speech Recognition

Paper
Add Code

Libri-adhoc40: A dataset collected from synchronized ad-hoc microphone arrays

1 code implementation • 28 Mar 2021 • Shanzheng Guan, Shupei Liu, Junqi Chen, Wenbo Zhu, Shengqiang Li, Xu Tan, Ziye Yang, Menglong Xu, Yijiang Chen, Jianyu Wang, Xiao-Lei Zhang

We trained several multi-device speech recognition systems on both the Libri-adhoc40 dataset and a simulated dataset.

speech-recognition Speech Recognition

Paper
Code

Deep NMF Topic Modeling

no code implementations • 24 Feb 2021 • Jianyu Wang, Xiao-Lei Zhang

In this paper, we propose a deep NMF (DNMF) topic modeling framework to alleviate the aforementioned problems.

Paper
Add Code

Speaker Recognition Based on Deep Learning: An Overview

no code implementations • 2 Dec 2020 • Zhongxin Bai, Xiao-Lei Zhang

Speaker recognition is a task of identifying persons from their voices.

Domain Adaptation speaker-diarization +4

Paper
Add Code

Deep Ad-hoc Beamforming Based on Speaker Extraction for Target-Dependent Speech Separation

no code implementations • 1 Dec 2020 • Ziye Yang, Shanzheng Guan, Xiao-Lei Zhang

In this paper, we propose deep ad-hoc beamforming based on speaker extraction, which is to our knowledge the first work for target-dependent speech separation based on ad-hoc microphone arrays and deep learning.

Speech Enhancement Speech Separation

Paper
Add Code

A comparison of handcrafted, parameterized, and learnable features for speech separation

no code implementations • 29 Nov 2020 • Wenbo Zhu, Mou Wang, Xiao-Lei Zhang, Susanto Rahardja

Among them, learnable features, which are trained with separation networks jointly in an end-to-end fashion, become a new trend of modern speech separation research, e. g. convolutional time domain audio separation network (Conv-Tasnet), while handcrafted and parameterized features are also shown competitive in very recent studies.

Sound

Paper
Add Code

Transformer-based End-to-End Speech Recognition with Local Dense Synthesizer Attention

1 code implementation • 23 Oct 2020 • Menglong Xu, Shengqiang Li, Xiao-Lei Zhang

To reduce the computational complexity and improve the performance, we further propose local DSA (LDSA) to restrict the attention scope of DSA to a local range around the current central frame for speech recognition.

speech-recognition Speech Recognition

Paper
Code

Speech enhancement aided end-to-end multi-task learning for voice activity detection

no code implementations • 23 Oct 2020 • Xu Tan, Xiao-Lei Zhang

Recent studies show that speech enhancement is helpful to VAD, but the performance improvement is limited.

Action Detection Activity Detection +3

Paper
Add Code

Partial AUC optimization based deep speaker embeddings with class-center learning for text-independent speaker verification

no code implementations • 19 Nov 2019 • Zhongxin Bai, Xiao-Lei Zhang, Jingdong Chen

We also propose a class-center based training trial construction method to improve the training efficiency, which is critical for the proposed loss function to be comparable to the identification loss in performance.

Text-Independent Speaker Verification

Paper
Add Code

Deep topic modeling by multilayer bootstrap network and lasso

no code implementations • 24 Oct 2019 • Jianyu Wang, Xiao-Lei Zhang

Specifically, we first apply multilayer bootstrap network (MBN), which is an unsupervised deep model, to reduce the dimension of documents, and then use the low-dimensional data representations or their clustering results as the target of supervised Lasso for topic word discovery.

Clustering Dimensionality Reduction +1

Paper
Add Code

Multi-channel Speech Separation Using Deep Embedding Model with Multilayer Bootstrap Networks

no code implementations • 24 Oct 2019 • Ziye Yang, Xiao-Lei Zhang

To deal with the problem, we propose a variant of DPCL, named DPCL++, by applying a recent unsupervised deep learning method---multilayer bootstrap networks(MBN)---to further reduce the noise and small variations of the embedding vectors in an unsupervised way in the test stage, which fascinates k-means to produce a good result.

Clustering Deep Clustering +1

Paper
Add Code

Deep Ad-hoc Beamforming

1 code implementation • 3 Nov 2018 • Xiao-Lei Zhang

Its core idea is to reweight the estimated speech signals with a sparsity constraint when conducting adaptive beamforming, where the weights produced by a neural network are an estimation of the propagation cost from the speech source to the ad-hoc microphone array, e. g. signal-to-noise ratios, and the sparsity constraint is to filter out the microphones that are too far away from both the speech source and the majority of the ad-hoc microphone array.

Sound Audio and Speech Processing

Paper
Code

Learning the kernel matrix by resampling

no code implementations • 1 Aug 2017 • Xiao-Lei Zhang

In this abstract paper, we introduce a new kernel learning method by a nonparametric density estimator.

Clustering

Paper
Add Code

Multilayer bootstrap network for unsupervised speaker recognition

no code implementations • 21 Sep 2015 • Xiao-Lei Zhang

We apply multilayer bootstrap network (MBN), a recent proposed unsupervised learning method, to unsupervised speaker recognition.

Clustering Speaker Recognition

Paper
Add Code

Preprint ARPPS Augmented Reality Pipeline Prospect System

no code implementations • 18 Aug 2015 • Xiao-Lei Zhang, Yong Han, DongSheng Hao, Zhihan Lv

This is the preprint version of our paper on ICONIP.

Paper
Add Code

Unsupervised model compression for multilayer bootstrap networks

no code implementations • 22 Mar 2015 • Xiao-Lei Zhang

Our result suggests that the new technique integrates the effectiveness of MBN on unsupervised learning and the effectiveness and efficiency of DNN on supervised learning together for the effectiveness and efficiency of compressive MBN on unsupervised learning.

Dimensionality Reduction Model Compression

Paper
Add Code

Deep Distributed Random Samplings for Supervised Learning: An Alternative to Random Forests?

no code implementations • 3 Dec 2014 • Xiao-Lei Zhang

In this paper, we further extend it to supervised learning incrementally.

Dimensionality Reduction

Paper
Add Code

Multilayer bootstrap networks

no code implementations • 5 Aug 2014 • Xiao-Lei Zhang

Geometrically, the nonparametric density estimator at each layer projects the input data space to a uniformly-distributed discrete feature space, where the similarity of two data points in the discrete feature space is measured by the number of the nearest centroids they share in common.

Clustering Dimensionality Reduction

Paper
Add Code

Learning Deep Representations By Distributed Random Samplings

no code implementations • 16 Dec 2013 • Xiao-Lei Zhang

In this paper, we propose an extremely simple deep model for the unsupervised nonlinear dimensionality reduction -- deep distributed random samplings, which performs like a stack of unsupervised bootstrap aggregating.

Clustering Dimensionality Reduction

Paper
Add Code

Learning Deep Representation Without Parameter Inference for Nonlinear Dimensionality Reduction

no code implementations • 22 Aug 2013 • Xiao-Lei Zhang

Restricted Boltzman machine, sparse coding, regularized auto-encoders, and convolutional neural networks are pioneering building blocks of deep learning.

Clustering Dimensionality Reduction +1