1 code implementation • 27 Jan 2025 • Haorui He, Zengqiang Shang, Chaoren Wang, Xuyuan Li, Yicheng Gu, Hua Hua, Liwei Liu, Chen Yang, Jiaqi Li, Peiyang Shi, Yuancheng Wang, Kai Chen, Pengyuan Zhang, Zhizheng Wu
By leveraging Emilia-Pipe, we construct Emilia, the first multilingual speech generation dataset derived from in-the-wild speech data.
1 code implementation • 1 Jan 2025 • Haitian Lu, Gaofeng Cheng, Liuping Luo, Leying Zhang, Yanmin Qian, Pengyuan Zhang
Recently, ``textless" speech language models (SLMs) based on speech units have made huge progress in generating naturalistic speech, including non-verbal vocalizations.
no code implementations • 15 Dec 2024 • Han Zhu, Gaofeng Cheng, Qingwei Zhao, Pengyuan Zhang
To make domain adaptation more applicable, we address the problem of zero-shot domain adaptation (ZSDA), where target domain data is unavailable in the target language.
1 code implementation • 7 Jul 2024 • Haorui He, Zengqiang Shang, Chaoren Wang, Xuyuan Li, Yicheng Gu, Hua Hua, Liwei Liu, Chen Yang, Jiaqi Li, Peiyang Shi, Yuancheng Wang, Kai Chen, Pengyuan Zhang, Zhizheng Wu
To facilitate the scale-up of Emilia, we also present Emilia-Pipe, the first open-source preprocessing pipeline designed to efficiently transform raw, in-the-wild speech data into high-quality training data with speech annotations.
no code implementations • 7 Jun 2024 • Ze Li, Yuke Lin, Tian Yao, Hongbin Suo, Pengyuan Zhang, Yanzhen Ren, Zexin Cai, Hiromitsu Nishizaki, Ming Li
We expect SSTC to be a platform for advancing the development of the SSV task and provide further insights into the performance and limitations of current SV systems against VC attacks.
no code implementations • 19 Apr 2024 • Chengxin Chen, Pengyuan Zhang
One persistent challenge in Speech Emotion Recognition (SER) is the ubiquitous environmental noise, which frequently results in deteriorating SER performance in practice.
1 code implementation • 26 Dec 2023 • Chengxin Chen, Pengyuan Zhang
As a vital aspect of affective computing, Multimodal Emotion Recognition has been an active research area in the multimedia community.
no code implementations • 25 Dec 2023 • Chengxin Chen, Pengyuan Zhang
One persistent challenge in deep learning based speech emotion recognition (SER) is the unconscious encoding of emotion-irrelevant factors (e. g., speaker or phonetic variability), which limits the generalization of SER in practical use.
no code implementations • 18 Oct 2023 • Jingze Lu, Yuxiang Zhang, Wenchao Wang, Zengqiang Shang, Pengyuan Zhang
Current spoofing speech detection systems need more convincing evidence.
no code implementations • 29 Sep 2023 • Yuxiang Zhang, Zhuo Li, Jingze Lu, Wenchao Wang, Pengyuan Zhang
Based on these analyzes, an SSD method based on temporal consistency and distribution of speaker features is proposed.
no code implementations • 21 Sep 2023 • Yuxiang Zhang, Zhuo Li, Jingze Lu, Hua Hua, Wenchao Wang, Pengyuan Zhang
First, the reasons for the impact are explored, including the proportion of silence duration and the content of silence.
no code implementations • 15 Sep 2023 • Yuxiang Zhang, Jingze Lu, Zengqiang Shang, Wenchao Wang, Pengyuan Zhang
The modified Res2Net blocks can extract multi-scale features and improve the detection performance for speech of different durations, thus improving the short utterance evaluation performance.
no code implementations • 15 Sep 2023 • Jingze Lu, Yuxiang Zhang, Wenchao Wang, Zengqiang Shang, Pengyuan Zhang
The detection of spoofing speech generated by unseen algorithms remains an unresolved challenge.
no code implementations • 12 Aug 2023 • Han Zhu, Dongji Gao, Gaofeng Cheng, Daniel Povey, Pengyuan Zhang, Yonghong Yan
Firstly, a generalized CTC loss function is introduced to handle noisy pseudo-labels by accepting alternative tokens in the positions of incorrect tokens.
no code implementations • 5 Jul 2023 • Haoran Miao, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan
However, how to deploy hybrid CTC/attention systems for online speech recognition is still a non-trivial problem.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 22 May 2023 • Zhuo Li, Jingze Lu, Zhenduo Zhao, Wenchao Wang, Pengyuan Zhang
Utilizing the large-scale unlabeled data from the target domain via pseudo-label clustering algorithms is an important approach for addressing domain adaptation problems in speaker verification tasks.
no code implementations • 26 Feb 2023 • Changfeng Gao, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan
This study proposes a unsupervised target-aware data selection method based on speech corpora divergence (SCD), which can measure the similarity between two speech corpora.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 18 Feb 2023 • Shengchang Xiao, Xueshuai Zhang, Pengyuan Zhang
Recently, convolutional neural networks (CNNs) have been widely used in sound event detection (SED).
no code implementations • 13 Oct 2022 • Yuxiang Zhang, Jingze Lu, Xingming Wang, Zhuo Li, Runqiu Xiao, Wenchao Wang, Ming Li, Pengyuan Zhang
The overfitting of the model to the training set leads to extreme values of the scores and low correlation of the score distributions, which makes score fusion difficult.
no code implementations • 12 Oct 2022 • Shuhao Deng, Chengfei Li, Jinfeng Bai, Qingqing Zhang, Wei-Qiang Zhang, Runyan Yang, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan
Code-switching automatic speech recognition becomes one of the most challenging and the most valuable scenarios of automatic speech recognition, due to the code-switching phenomenon between multilingual language and the frequent occurrence of code-switching phenomenon in daily life.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
1 code implementation • 17 Aug 2022 • Gaofeng Cheng, Yifan Chen, Runyan Yang, Qingxuan Li, Zehui Yang, Lingxuan Ye, Pengyuan Zhang, Qingqing Zhang, Lei Xie, Yanmin Qian, Kong Aik Lee, Yonghong Yan
In the metric aspect, we design the new conversational DER (CDER) evaluation metric, which calculates the SD accuracy at the utterance level.
no code implementations • 1 Jul 2022 • Yuxiang Zhang, Zhuo Li, Wenchao Wang, Pengyuan Zhang
Based on the assumption that there is a correlation between anti-spoofing and speaker verification, a Total-Divide-Total integrated Spoofing-Aware Speaker Verification (SASV) system based on pre-trained automatic speaker verification (ASV) system and integrated scoring module is proposed and submitted to the SASV 2022 Challenge.
no code implementations • 28 Jun 2022 • Yifan Chen, Yifan Guo, Qingxuan Li, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan
For online speaker diarization, samples arrive incrementally, and the overall distribution of the samples is invisible.
1 code implementation • 20 Jun 2022 • Han Zhu, Gaofeng Cheng, Jindong Wang, Wenxin Hou, Pengyuan Zhang, Yonghong Yan
The cross-domain performance of automatic speech recognition (ASR) could be severely hampered due to the mismatch between training and testing distributions.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • 18 Jun 2022 • Han Zhu, Jindong Wang, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan
Secondly, to reduce the communication and computation costs, we propose decoupled federated learning (DecoupleFL).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 25 Apr 2022 • Chengxin Chen, Meng Wang, Pengyuan Zhang
Recently, audio-visual scene classification (AVSC) has attracted increasing attention from multidisciplinary communities.
no code implementations • 31 Mar 2022 • Zehui Yang, Yifan Chen, Lei Luo, Runyan Yang, Lingxuan Ye, Gaofeng Cheng, Ji Xu, Yaohui Jin, Qingqing Zhang, Pengyuan Zhang, Lei Xie, Yonghong Yan
As a Mandarin speech dataset designed for dialog scenarios with high quality and rich annotations, MagicData-RAMC enriches the data diversity in the Mandarin speech community and allows extensive research on a series of speech-related tasks, including automatic speech recognition, speaker diarization, topic detection, keyword search, text-to-speech, etc.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
no code implementations • 31 Mar 2022 • Chengxin Chen, Pengyuan Zhang
To further exploit the embeddings from different layers of the ASR encoder, we propose a novel CTA-RNN architecture to capture the emotional salient parts of embeddings in both the channel and temporal directions.
1 code implementation • 22 Feb 2022 • Keqi Deng, Songjun Cao, Yike Zhang, Long Ma, Gaofeng Cheng, Ji Xu, Pengyuan Zhang
Recently, end-to-end automatic speech recognition models based on connectionist temporal classification (CTC) have achieved impressive results, especially when fine-tuned from wav2vec2. 0 models.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • 25 Jan 2022 • Keqi Deng, Zehui Yang, Shinji Watanabe, Yosuke Higuchi, Gaofeng Cheng, Pengyuan Zhang
The proposed NAR model significantly surpasses previous NAR systems on the AISHELL-1 benchmark and shows a potential for English tasks.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 23 Dec 2021 • Changfeng Gao, Gaofeng Cheng, Pengyuan Zhang
Nevertheless, most of the previous SSL methods ignore the influence of the background noise or reverberation, which is crucial to deploying ASR systems in real-world speech applications.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • 9 Oct 2021 • Han Zhu, Li Wang, Jindong Wang, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan
In this work, in order to build a better pre-trained model for low-resource ASR, we propose a pre-training approach called wav2vec-S, where we use task-specific semi-supervised pre-training to refine the self-supervised pre-trained model for the ASR task thus more effectively utilize the capacity of the pre-trained model to generate task-specific representations for ASR.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 12 Apr 2021 • Yukun Liu, Ta Li, Pengyuan Zhang, Yonghong Yan
Recently neural architecture search(NAS) has been successfully used in image classification, natural language processing, and automatic speech recognition(ASR) tasks for finding the state-of-the-art(SOTA) architectures than those human-designed architectures.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
1 code implementation • 5 Feb 2021 • Hangting Chen, Yang Yi, Dang Feng, Pengyuan Zhang
The proposed framework facilitates iterative signal refinement with the guide of beamforming and seeks to reach the upper bound of the MVDR-based methods.
no code implementations • 5 Nov 2020 • Han Zhu, Li Wang, Pengyuan Zhang, Yonghong Yan
To jointly train the acoustic model and the accent classifier, we propose the multi-task learning with gate mechanism (MTL-G).
1 code implementation • 5 Nov 2020 • Han Zhu, Jiangjiang Zhao, Yuling Ren, Li Wang, Pengyuan Zhang
Then, for each class, probabilities of this class are used to compute a mean vector, which we refer to as mean soft labels.
no code implementations • 20 Oct 2020 • Yuzhuo Liu, Hangting Chen, YunWang, Pengyuan Zhang
While this paper focuses on sound event detection applications, the proposed method can be applied to MIL tasks in other domains.
no code implementations • 1 Jul 2020 • Hangting Chen, Pengyuan Zhang
Deep attractor networks (DANs) perform speech separation with discriminative embeddings and speaker attractors.
no code implementations • 15 Jan 2020 • Haoran Miao, Gaofeng Cheng, Changfeng Gao, Pengyuan Zhang, Yonghong Yan
To support the online recognition, we integrate the state reuse chunk-SAE and the MTA based SAD into online CTC/attention architecture.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 25 Dec 2019 • Lu Huang, Gaofeng Cheng, Pengyuan Zhang, Yi Yang, Shumin Xu, Jiasong Sun
The experimental results show that uPIT outperforms cPIT when LC-BLSTM is used during inference.
2 code implementations • 31 Oct 2019 • Yue Fan, Jiawen Kang, Lantian Li, Kaicheng Li, Haolin Chen, Sitong Cheng, Pengyuan Zhang, Ziya Zhou, Yunqi Cai, Dong Wang
These datasets tend to deliver over optimistic performance and do not meet the request of research on speaker recognition in unconstrained conditions.
no code implementations • 15 Jul 2019 • Hangting Chen, Zuozhen Liu, Zongming Liu, Pengyuan Zhang, Yonghong Yan
This technical report describes the IOA team's submission for TASK1A of DCASE2019 challenge.
no code implementations • 1 Jan 2019 • Sifan Wu1, Fei Li1, Pengyuan Zhang
Emotion recognition plays an increasingly important role in human-computer interaction systems, which is a key technology in multimedia communication.
no code implementations • 21 Sep 2015 • Xiaofei Wang, Chao Wu, Pengyuan Zhang, Ziteng Wang, Yong liu, Xu Li, Qiang Fu, Yonghong Yan
This paper presents the contribution to the third 'CHiME' speech separation and recognition challenge including both front-end signal processing and back-end speech recognition.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4