Search Results for author: Yu-Xiao Guo

Found 7 papers, 4 papers with code

VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time

no code implementations16 Apr 2024 Sicheng Xu, Guojun Chen, Yu-Xiao Guo, Jiaolong Yang, Chong Li, Zhenyu Zang, Yizhong Zhang, Xin Tong, Baining Guo

We introduce VASA, a framework for generating lifelike talking faces with appealing visual affective skills (VAS) given a single static image and a speech audio clip.

MVD$^2$: Efficient Multiview 3D Reconstruction for Multiview Diffusion

no code implementations22 Feb 2024 Xin-Yang Zheng, Hao Pan, Yu-Xiao Guo, Xin Tong, Yang Liu

By finetuning pretrained large image diffusion models with 3D data, the MVD methods first generate multiple views of a 3D object based on an image or text prompt and then reconstruct 3D shapes with multiview 3D reconstruction.

3D Generation 3D Reconstruction

Swin3D++: Effective Multi-Source Pretraining for 3D Indoor Scene Understanding

1 code implementation22 Feb 2024 Yu-Qi Yang, Yu-Xiao Guo, Yang Liu

Data diversity and abundance are essential for improving the performance and generalization of models in natural language processing and 2D vision.

Scene Understanding

Swin3D: A Pretrained Transformer Backbone for 3D Indoor Scene Understanding

2 code implementations14 Apr 2023 Yu-Qi Yang, Yu-Xiao Guo, Jian-Yu Xiong, Yang Liu, Hao Pan, Peng-Shuai Wang, Xin Tong, Baining Guo

We pretrained a large {\SST} model on a synthetic Structured3D dataset, which is an order of magnitude larger than the ScanNet dataset.

Ranked #2 on 3D Object Detection on S3DIS (using extra training data)

3D Object Detection Scene Understanding +1

Indoor Scene Generation from a Collection of Semantic-Segmented Depth Images

1 code implementation ICCV 2021 Ming-Jia Yang, Yu-Xiao Guo, Bin Zhou, Xin Tong

Different from existing methods that represent an indoor scene with the type, location, and other properties of objects in the room and learn the scene layout from a collection of complete 3D indoor scenes, our method models each indoor scene as a 3D semantic scene volume and learns a volumetric generative adversarial network (GAN) from a collection of 2. 5D partial observations of 3D scenes.

Generative Adversarial Network Scene Generation

View-volume Network for Semantic Scene Completion from a Single Depth Image

no code implementations14 Jun 2018 Yu-Xiao Guo, Xin Tong

We introduce a View-Volume convolutional neural network (VVNet) for inferring the occupancy and semantic labels of a volumetric 3D scene from a single depth image.

3D Semantic Scene Completion

Cannot find the paper you are looking for? You can Submit a new open access paper.