no code implementations • 28 Jun 2023 • Aoqi Guo, Junnan Wu, Peng Gao, Wenbo Zhu, Qinwen Guo, Dazhi Gao, Yujun Wang
In this paper, we propose a target speech extraction network that utilizes spatial information to enhance the performance of neural beamformer.
no code implementations • 20 Mar 2023 • Fan Cui, Liyong Guo, Lang He, Jiyao Liu, Ercheng Pei, Yujun Wang, Dongmei Jiang
Electroencephalography (EEG) plays a vital role in detecting how brain responses to different stimulus.
no code implementations • 20 Mar 2023 • Fan Cui, Liyong Guo, Quandong Wang, Peng Gao, Yujun Wang
To address those challenges, we explore representation learning for KWS by self-supervised contrastive learning and self-training with pretrained model.
no code implementations • 10 Mar 2023 • Yifei Xin, Dongchao Yang, Fan Cui, Yujun Wang, Yuexian Zou
Existing weakly supervised sound event detection (WSSED) work has not explored both types of co-occurrences simultaneously, i. e., some sound events often co-occur, and their occurrences are usually accompanied by specific background sounds, so they would be inevitably entangled, causing misclassification and biased localization results with only clip-level supervision.
no code implementations • 7 Dec 2022 • Fengyu Yang, Jian Luan, Yujun Wang
We introduce phonology embedding to capture the English differences between different phonology.
no code implementations • 31 Mar 2022 • Li Wang, Rongzhi Gu, Weiji Zhuang, Peng Gao, Yujun Wang, Yuexian Zou
Bearing this in mind, a two-branch deep network (KWS branch and SV branch) with the same network structure is developed and a novel decoupling feature learning method is proposed to push up the performance of KWS and SV simultaneously where speaker-invariant keyword representations and keyword-invariant speaker representations are expected respectively.
no code implementations • 9 Oct 2021 • Yunchao He, Jian Luan, Yujun Wang
Sequence expansion between encoder and decoder is a critical challenge in sequence-to-sequence TTS.
2 code implementations • 13 Jun 2021 • Guoguo Chen, Shuzhou Chai, Guanbo Wang, Jiayu Du, Wei-Qiang Zhang, Chao Weng, Dan Su, Daniel Povey, Jan Trmal, Junbo Zhang, Mingjie Jin, Sanjeev Khudanpur, Shinji Watanabe, Shuaijiang Zhao, Wei Zou, Xiangang Li, Xuchen Yao, Yongqing Wang, Yujun Wang, Zhao You, Zhiyong Yan
This paper introduces GigaSpeech, an evolving, multi-domain English speech recognition corpus with 10, 000 hours of high quality labeled audio suitable for supervised training, and 40, 000 hours of total audio suitable for semi-supervised and unsupervised training.
Ranked #1 on
Speech Recognition
on GigaSpeech
2 code implementations • 3 Apr 2021 • Junbo Zhang, Zhiwen Zhang, Yongqing Wang, Zhiyong Yan, Qiong Song, YuKai Huang, Ke Li, Daniel Povey, Yujun Wang
This paper introduces a new open-source speech corpus named "speechocean762" designed for pronunciation assessment use, consisting of 5000 English utterances from 250 non-native speakers, where half of the speakers are children.
Ranked #7 on
Phone-level pronunciation scoring
on speechocean762
no code implementations • 8 Sep 2020 • Bo Zhang, Wenfeng Li, Qingyuan Li, Weiji Zhuang, Xiangxiang Chu, Yujun Wang
Smart audio devices are gated by an always-on lightweight keyword spotting program to reduce power consumption.
1 code implementation • 10 Apr 2019 • Yunchao He, Yujun Wang
Neural network-based vocoders have recently demonstrated the powerful ability to synthesize high-quality speech.
2 code implementations • 29 Mar 2018 • Changhao Shan, Junbo Zhang, Yujun Wang, Lei Xie
In this paper, we propose an attention-based end-to-end neural approach for small-footprint keyword spotting (KWS), which aims to simplify the pipelines of building a production-quality KWS system.
1 code implementation • 27 Mar 2018 • Ke Wang, Junbo Zhang, Yujun Wang, Lei Xie
Speaker adaptation aims to estimate a speaker specific acoustic model from a speaker independent one to minimize the mismatch between the training and testing conditions arisen from speaker variabilities.
1 code implementation • 27 Mar 2018 • Ke Wang, Junbo Zhang, Sining Sun, Yujun Wang, Fei Xiang, Lei Xie
First, we study the effectiveness of different dereverberation networks (the generator in GAN) and find that LSTM leads a significant improvement as compared with feed-forward DNN and CNN in our dataset.
no code implementations • 22 Jul 2017 • Changhao Shan, Junbo Zhang, Yujun Wang, Lei Xie
Previous attempts have shown that applying attention-based encoder-decoder to Mandarin speech recognition was quite difficult due to the logographic orthography of Mandarin, the large vocabulary and the conditional dependency of the attention model.