Search Results for author: Pengyuan Zhang

Found 39 papers, 7 papers with code

TRNet: Two-level Refinement Network leveraging Speech Enhancement for Noise Robust Speech Emotion Recognition

no code implementations19 Apr 2024 Chengxin Chen, Pengyuan Zhang

One persistent challenge in Speech Emotion Recognition (SER) is the ubiquitous environmental noise, which frequently results in diminished SER performance in practical use.

Speech Emotion Recognition Speech Enhancement

Modality-Collaborative Transformer with Hybrid Feature Reconstruction for Robust Emotion Recognition

1 code implementation26 Dec 2023 Chengxin Chen, Pengyuan Zhang

As a vital aspect of affective computing, Multimodal Emotion Recognition has been an active research area in the multimedia community.

Multimodal Emotion Recognition

DSNet: Disentangled Siamese Network with Neutral Calibration for Speech Emotion Recognition

no code implementations25 Dec 2023 Chengxin Chen, Pengyuan Zhang

One persistent challenge in deep learning based speech emotion recognition (SER) is the unconscious encoding of emotion-irrelevant factors (e. g., speaker or phonetic variability), which limits the generalization of SER in practical use.

Disentanglement Speech Emotion Recognition

Synthetic Speech Detection Based on Temporal Consistency and Distribution of Speaker Features

no code implementations29 Sep 2023 Yuxiang Zhang, Zhuo Li, Jingze Lu, Wenchao Wang, Pengyuan Zhang

Based on these analyzes, an SSD method based on temporal consistency and distribution of speaker features is proposed.

Synthetic Speech Detection

The Impact of Silence on Speech Anti-Spoofing

no code implementations21 Sep 2023 Yuxiang Zhang, Zhuo Li, Jingze Lu, Hua Hua, Wenchao Wang, Pengyuan Zhang

First, the reasons for the impact are explored, including the proportion of silence duration and the content of silence.

Action Detection Activity Detection +1

Improving Short Utterance Anti-Spoofing with AASIST2

no code implementations15 Sep 2023 Yuxiang Zhang, Jingze Lu, Zengqiang Shang, Wenchao Wang, Pengyuan Zhang

The modified Res2Net blocks can extract multi-scale features and improve the detection performance for speech of different durations, thus improving the short utterance evaluation performance.

Graph Attention Speaker Verification

Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition

no code implementations12 Aug 2023 Han Zhu, Dongji Gao, Gaofeng Cheng, Daniel Povey, Pengyuan Zhang, Yonghong Yan

Firstly, a generalized CTC loss function is introduced to handle noisy pseudo-labels by accepting alternative tokens in the positions of incorrect tokens.

Automatic Speech Recognition speech-recognition +1

Progressive Sub-Graph Clustering Algorithm for Semi-Supervised Domain Adaptation Speaker Verification

no code implementations22 May 2023 Zhuo Li, Jingze Lu, Zhenduo Zhao, Wenchao Wang, Pengyuan Zhang

Utilizing the large-scale unlabeled data from the target domain via pseudo-label clustering algorithms is an important approach for addressing domain adaptation problems in speaker verification tasks.

Clustering Domain Adaptation +4

Speech Corpora Divergence Based Unsupervised Data Selection for ASR

no code implementations26 Feb 2023 Changfeng Gao, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan

This study proposes a unsupervised target-aware data selection method based on speech corpora divergence (SCD), which can measure the similarity between two speech corpora.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Deepfake Detection System for the ADD Challenge Track 3.2 Based on Score Fusion

no code implementations13 Oct 2022 Yuxiang Zhang, Jingze Lu, Xingming Wang, Zhuo Li, Runqiu Xiao, Wenchao Wang, Ming Li, Pengyuan Zhang

The overfitting of the model to the training set leads to extreme values of the scores and low correlation of the score distributions, which makes score fusion difficult.

Data Augmentation DeepFake Detection +1

Summary on the ISCSLP 2022 Chinese-English Code-Switching ASR Challenge

no code implementations12 Oct 2022 Shuhao Deng, Chengfei Li, Jinfeng Bai, Qingqing Zhang, Wei-Qiang Zhang, Runyan Yang, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan

Code-switching automatic speech recognition becomes one of the most challenging and the most valuable scenarios of automatic speech recognition, due to the code-switching phenomenon between multilingual language and the frequent occurrence of code-switching phenomenon in daily life.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

SASV Based on Pre-trained ASV System and Integrated Scoring Module

no code implementations1 Jul 2022 Yuxiang Zhang, Zhuo Li, Wenchao Wang, Pengyuan Zhang

Based on the assumption that there is a correlation between anti-spoofing and speaker verification, a Total-Divide-Total integrated Spoofing-Aware Speaker Verification (SASV) system based on pre-trained automatic speaker verification (ASV) system and integrated scoring module is proposed and submitted to the SASV 2022 Challenge.

Speaker Verification

Boosting Cross-Domain Speech Recognition with Self-Supervision

1 code implementation20 Jun 2022 Han Zhu, Gaofeng Cheng, Jindong Wang, Wenxin Hou, Pengyuan Zhang, Yonghong Yan

The cross-domain performance of automatic speech recognition (ASR) could be severely hampered due to the mismatch between training and testing distributions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Audio-Visual Scene Classification Using A Transfer Learning Based Joint Optimization Strategy

no code implementations25 Apr 2022 Chengxin Chen, Meng Wang, Pengyuan Zhang

Recently, audio-visual scene classification (AVSC) has attracted increasing attention from multidisciplinary communities.

Scene Classification Transfer Learning

Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational(RAMC) Speech Dataset

no code implementations31 Mar 2022 Zehui Yang, Yifan Chen, Lei Luo, Runyan Yang, Lingxuan Ye, Gaofeng Cheng, Ji Xu, Yaohui Jin, Qingqing Zhang, Pengyuan Zhang, Lei Xie, Yonghong Yan

As a Mandarin speech dataset designed for dialog scenarios with high quality and rich annotations, MagicData-RAMC enriches the data diversity in the Mandarin speech community and allows extensive research on a series of speech-related tasks, including automatic speech recognition, speaker diarization, topic detection, keyword search, text-to-speech, etc.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

CTA-RNN: Channel and Temporal-wise Attention RNN Leveraging Pre-trained ASR Embeddings for Speech Emotion Recognition

no code implementations31 Mar 2022 Chengxin Chen, Pengyuan Zhang

To further exploit the embeddings from different layers of the ASR encoder, we propose a novel CTA-RNN architecture to capture the emotional salient parts of embeddings in both the channel and temporal directions.

Cross-corpus Speech Emotion Recognition

Improving CTC-based speech recognition via knowledge transferring from pre-trained language models

1 code implementation22 Feb 2022 Keqi Deng, Songjun Cao, Yike Zhang, Long Ma, Gaofeng Cheng, Ji Xu, Pengyuan Zhang

Recently, end-to-end automatic speech recognition models based on connectionist temporal classification (CTC) have achieved impressive results, especially when fine-tuned from wav2vec2. 0 models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Multi-Variant Consistency based Self-supervised Learning for Robust Automatic Speech Recognition

no code implementations23 Dec 2021 Changfeng Gao, Gaofeng Cheng, Pengyuan Zhang

Nevertheless, most of the previous SSL methods ignore the influence of the background noise or reverberation, which is crucial to deploying ASR systems in real-world speech applications.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Wav2vec-S: Semi-Supervised Pre-Training for Low-Resource ASR

no code implementations9 Oct 2021 Han Zhu, Li Wang, Jindong Wang, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan

In this work, in order to build a better pre-trained model for low-resource ASR, we propose a pre-training approach called wav2vec-S, where we use task-specific semi-supervised pre-training to refine the self-supervised pre-trained model for the ASR task thus more effectively utilize the capacity of the pre-trained model to generate task-specific representations for ASR.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Improved Conformer-based End-to-End Speech Recognition Using Neural Architecture Search

no code implementations12 Apr 2021 Yukun Liu, Ta Li, Pengyuan Zhang, Yonghong Yan

Recently neural architecture search(NAS) has been successfully used in image classification, natural language processing, and automatic speech recognition(ASR) tasks for finding the state-of-the-art(SOTA) architectures than those human-designed architectures.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Beam-Guided TasNet: An Iterative Speech Separation Framework with Multi-Channel Output

1 code implementation5 Feb 2021 Hangting Chen, Yang Yi, Dang Feng, Pengyuan Zhang

The proposed framework facilitates iterative signal refinement with the guide of beamforming and seeks to reach the upper bound of the MVDR-based methods.

blind source separation Speech Separation

Multi-Accent Adaptation based on Gate Mechanism

no code implementations5 Nov 2020 Han Zhu, Li Wang, Pengyuan Zhang, Yonghong Yan

To jointly train the acoustic model and the accent classifier, we propose the multi-task learning with gate mechanism (MTL-G).

Multi-Task Learning speech-recognition +1

Domain Adaptation Using Class Similarity for Robust Speech Recognition

1 code implementation5 Nov 2020 Han Zhu, Jiangjiang Zhao, Yuling Ren, Li Wang, Pengyuan Zhang

Then, for each class, probabilities of this class are used to compute a mean vector, which we refer to as mean soft labels.

Domain Adaptation Robust Speech Recognition +1

Power pooling: An adaptive pooling function for weakly labelled sound event detection

no code implementations20 Oct 2020 Yuzhuo Liu, Hangting Chen, YunWang, Pengyuan Zhang

While this paper focuses on sound event detection applications, the proposed method can be applied to MIL tasks in other domains.

Event Detection Multiple Instance Learning +1

Exploring the time-domain deep attractor network with two-stream architectures in a reverberant environment

no code implementations1 Jul 2020 Hangting Chen, Pengyuan Zhang

Deep attractor networks (DANs) perform speech separation with discriminative embeddings and speaker attractors.

Speech Separation

CN-CELEB: a challenging Chinese speaker recognition dataset

2 code implementations31 Oct 2019 Yue Fan, Jiawen Kang, Lantian Li, Kaicheng Li, Haolin Chen, Sitong Cheng, Pengyuan Zhang, Ziya Zhou, Yunqi Cai, Dong Wang

These datasets tend to deliver over optimistic performance and do not meet the request of research on speaker recognition in unconstrained conditions.

Speaker Recognition

Weighted Feature Fusion Based Emotional Recognition for Variable-length Speech using DNN

no code implementations1 Jan 2019 Sifan Wu1, Fei Li1, Pengyuan Zhang

Emotion recognition plays an increasingly important role in human-computer interaction systems, which is a key technology in multimedia communication.

Emotion Recognition

Noise Robust IOA/CAS Speech Separation and Recognition System For The Third 'CHIME' Challenge

no code implementations21 Sep 2015 Xiaofei Wang, Chao Wu, Pengyuan Zhang, Ziteng Wang, Yong liu, Xu Li, Qiang Fu, Yonghong Yan

This paper presents the contribution to the third 'CHiME' speech separation and recognition challenge including both front-end signal processing and back-end speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Cannot find the paper you are looking for? You can Submit a new open access paper.