no code implementations • 12 Dec 2023 • Haiming Zhang, Zhihao Yuan, Chaoda Zheng, Xu Yan, Baoyuan Wang, Guanbin Li, Song Wu, Shuguang Cui, Zhen Li
Our proposed GSmoothFace model mainly consists of the Audio to Expression Prediction (A2EP) module and the Target Adaptive Face Translation (TAFT) module.
no code implementations • 26 Nov 2023 • Zhihao Yuan, Jinke Ren, Chun-Mei Feng, Hengshuang Zhao, Shuguang Cui, Zhen Li
Building on this, we design a visual program that consists of three types of modules, i. e., view-independent, view-dependent, and functional modules.
no code implementations • 5 Jul 2022 • Zhihao Yuan, Xu Yan, Zhuo Li, Xuhao Li, Yao Guo, Shuguang Cui, Zhen Li
Recent progress in 3D scene understanding has explored visual grounding (3DVG) to localize a target object through a language description.
1 code implementation • CVPR 2022 • Zhihao Yuan, Xu Yan, Yinghong Liao, Yao Guo, Guanbin Li, Zhen Li, Shuguang Cui
Thus, a more faithful caption can be generated only using point clouds during the inference.
1 code implementation • 22 Dec 2021 • Xu Yan, Zhihao Yuan, Yuhao Du, Yinghong Liao, Yao Guo, Zhen Li, Shuguang Cui
To tackle this problem, we propose the CLEVR3D, a large-scale VQA-3D dataset consisting of 171K questions from 8, 771 3D scenes.
1 code implementation • ICCV 2021 • Zhihao Yuan, Xu Yan, Yinghong Liao, Ruimao Zhang, Sheng Wang, Zhen Li, Shuguang Cui
Compared with the visual grounding on 2D images, the natural-language-guided 3D object localization on point clouds is more challenging.