Search Results for author: Zhihao Yuan

Found 6 papers, 3 papers with code

GSmoothFace: Generalized Smooth Talking Face Generation via Fine Grained 3D Face Guidance

no code implementations • 12 Dec 2023 • Haiming Zhang, Zhihao Yuan, Chaoda Zheng, Xu Yan, Baoyuan Wang, Guanbin Li, Song Wu, Shuguang Cui, Zhen Li

Our proposed GSmoothFace model mainly consists of the Audio to Expression Prediction (A2EP) module and the Target Adaptive Face Translation (TAFT) module.

Face Model Talking Face Generation

Paper
Add Code

Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding

no code implementations • 26 Nov 2023 • Zhihao Yuan, Jinke Ren, Chun-Mei Feng, Hengshuang Zhao, Shuguang Cui, Zhen Li

Building on this, we design a visual program that consists of three types of modules, i. e., view-independent, view-dependent, and functional modules.

Object Visual Grounding

Paper
Add Code

Toward Explainable and Fine-Grained 3D Grounding through Referring Textual Phrases

no code implementations • 5 Jul 2022 • Zhihao Yuan, Xu Yan, Zhuo Li, Xuhao Li, Yao Guo, Shuguang Cui, Zhen Li

Recent progress in 3D scene understanding has explored visual grounding (3DVG) to localize a target object through a language description.

Object Representation Learning +3

Paper
Add Code

X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning

1 code implementation • CVPR 2022 • Zhihao Yuan, Xu Yan, Yinghong Liao, Yao Guo, Guanbin Li, Zhen Li, Shuguang Cui

Thus, a more faithful caption can be generated only using point clouds during the inference.

3D dense captioning Dense Captioning +2

Paper
Code

Comprehensive Visual Question Answering on Point Clouds through Compositional Scene Manipulation

1 code implementation • 22 Dec 2021 • Xu Yan, Zhihao Yuan, Yuhao Du, Yinghong Liao, Yao Guo, Zhen Li, Shuguang Cui

To tackle this problem, we propose the CLEVR3D, a large-scale VQA-3D dataset consisting of 171K questions from 8, 771 3D scenes.

Common Sense Reasoning Question Answering +3

Paper
Code

InstanceRefer: Cooperative Holistic Understanding for Visual Grounding on Point Clouds through Instance Multi-level Contextual Referring

1 code implementation • ICCV 2021 • Zhihao Yuan, Xu Yan, Yinghong Liao, Ruimao Zhang, Sheng Wang, Zhen Li, Shuguang Cui

Compared with the visual grounding on 2D images, the natural-language-guided 3D object localization on point clouds is more challenging.

Attribute Object Localization +2

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.