1 code implementation • 7 Nov 2021 • Qinghua Liu, Yating Huang, Yunzhe Hao, Jiaming Xu, Bo Xu
Multi-modal cues, including spatial information, facial expression and voiceprint, are introduced to the speech separation and speaker extraction tasks to serve as complementary information to achieve better performance.
1 code implementation • 13 Jun 2021 • Yunzhe Hao, Jiaming Xu, Peng Zhang, Bo Xu
In the speaker extraction problem, it is found that additional information from the target speaker contributes to the tracking and extraction of the target speaker, which includes voiceprint, lip movement, facial expression, and spatial information.
no code implementations • 29 Nov 2020 • Peng Zhang, Jiaming Xu, Jing Shi, Yunzhe Hao, Bo Xu
In our model, we use the face detector to detect the number of speakers in the scene and use visual information to avoid the permutation problem.
1 code implementation • 17 Dec 2018 • Yunzhe Hao, Xuhui Huang, Meng Dong, Bo Xu
By combining the sym-STDP rule with bio-plausible synaptic scaling and intrinsic plasticity of the dynamic threshold, our SNN model implemented SL well and achieved good performance in the benchmark recognition task (MNIST dataset).