no code implementations • 14 Dec 2023 • Yuhang Yang, Wei Zhai, Hongchen Luo, Yang Cao, Zheng-Jun Zha
Which underexploit certain correlations between the interaction counterparts (human and object), and struggle to address the uncertainty in interactions.
no code implementations • 29 Nov 2023 • Yuhang Yang, Yizhou Peng, Xionghu Zhong, Hao Huang, Eng Siong Chng
The Mixed Error Rate results show that the amount of adaptation data may be as low as $1\sim10$ hours to achieve saturation in performance gain (SEAME) while the ASRU task continued to show performance with more adaptation data ($>$100 hours).
1 code implementation • ICCV 2023 • Yuhang Yang, Wei Zhai, Hongchen Luo, Yang Cao, Jiebo Luo, Zheng-Jun Zha
Comprehensive experiments on PIAD demonstrate the reliability of the proposed task and the superiority of our method.
1 code implementation • 1 Nov 2022 • Yuhang Yang, HaiHua Xu, Hao Huang, Eng Siong Chng, Sheng Li
To let the state-of-the-art end-to-end ASR model enjoy data efficiency, as well as much more unpaired text data by multi-modal training, one needs to address two problems: 1) the synchronicity of feature sampling rates between speech and language (aka text data); 2) the homogeneity of the learned representations from two encoders.
1 code implementation • 8 Apr 2022 • Qianying Liu, Zhuo Gong, Zhengdong Yang, Yuhang Yang, Sheng Li, Chenchen Ding, Nobuaki Minematsu, Hao Huang, Fei Cheng, Chenhui Chu, Sadao Kurohashi
Low-resource speech recognition has been long-suffering from insufficient training data.
no code implementations • 9 Jun 2021 • Zilin Ding, Yuhang Yang, Xuan Cheng, Xiaomin Wang, Ming Liu
In this paper we find that features in CNNs can be also used for self-supervision.
no code implementations • 9 Jun 2021 • Yuhang Yang, Zilin Ding, Xuan Cheng, Xiaomin Wang, Ming Liu
In this paper, we show that feature transformations within CNNs can also be regarded as supervisory signals to construct the self-supervised task, called \emph{internal pretext task}.