no code implementations • 16 Apr 2024 • Sicheng Xu, Guojun Chen, Yu-Xiao Guo, Jiaolong Yang, Chong Li, Zhenyu Zang, Yizhong Zhang, Xin Tong, Baining Guo
We introduce VASA, a framework for generating lifelike talking faces with appealing visual affective skills (VAS) given a single static image and a speech audio clip.
no code implementations • 22 Feb 2024 • Xin-Yang Zheng, Hao Pan, Yu-Xiao Guo, Xin Tong, Yang Liu
By finetuning pretrained large image diffusion models with 3D data, the MVD methods first generate multiple views of a 3D object based on an image or text prompt and then reconstruct 3D shapes with multiview 3D reconstruction.
1 code implementation • 22 Feb 2024 • Yu-Qi Yang, Yu-Xiao Guo, Yang Liu
Data diversity and abundance are essential for improving the performance and generalization of models in natural language processing and 2D vision.
2 code implementations • 14 Apr 2023 • Yu-Qi Yang, Yu-Xiao Guo, Jian-Yu Xiong, Yang Liu, Hao Pan, Peng-Shuai Wang, Xin Tong, Baining Guo
We pretrained a large {\SST} model on a synthetic Structured3D dataset, which is an order of magnitude larger than the ScanNet dataset.
Ranked #2 on 3D Object Detection on S3DIS (using extra training data)
1 code implementation • ICCV 2021 • Ming-Jia Yang, Yu-Xiao Guo, Bin Zhou, Xin Tong
Different from existing methods that represent an indoor scene with the type, location, and other properties of objects in the room and learn the scene layout from a collection of complete 3D indoor scenes, our method models each indoor scene as a 3D semantic scene volume and learns a volumetric generative adversarial network (GAN) from a collection of 2. 5D partial observations of 3D scenes.
no code implementations • 14 Jun 2018 • Yu-Xiao Guo, Xin Tong
We introduce a View-Volume convolutional neural network (VVNet) for inferring the occupancy and semantic labels of a volumetric 3D scene from a single depth image.
Ranked #20 on 3D Semantic Scene Completion on NYUv2
1 code implementation • 5 Dec 2017 • Peng-Shuai Wang, Yang Liu, Yu-Xiao Guo, Chun-Yu Sun, Xin Tong
We present O-CNN, an Octree-based Convolutional Neural Network (CNN) for 3D shape analysis.
Ranked #4 on 3D Object Classification on ModelNet40