Search Results for author: Fangzhou Hong

Found 35 papers, 21 papers with code

GeneMAN: Generalizable Single-Image 3D Human Reconstruction from Multi-Source Human Data

no code implementations27 Nov 2024 Wentao Wang, Hang Ye, Fangzhou Hong, Xue Yang, Jianfu Zhang, Yizhou Wang, Ziwei Liu, Liang Pan

2) With the help of the pretrained human prior models, the Geometry Initialization-&-Sculpting pipeline is leveraged to recover high-quality 3D human geometry given a single image.

3D Human Reconstruction Image to 3D

GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation

no code implementations12 Nov 2024 Yushi Lan, Shangchen Zhou, Zhaoyang Lyu, Fangzhou Hong, Shuai Yang, Bo Dai, Xingang Pan, Chen Change Loy

While 3D content generation has advanced significantly, existing methods still face challenges with input formats, latent space design, and output representations.

3D Generation Disentanglement

EgoLM: Multi-Modal Language Model of Egocentric Motions

no code implementations26 Sep 2024 Fangzhou Hong, Vladimir Guzov, Hyo Jin Kim, Yuting Ye, Richard Newcombe, Ziwei Liu, Lingni Ma

Multi-modal sensor inputs are encoded and projected to the joint latent space of language models, and used to prompt motion generation or text generation for egomotion tracking or understanding, respectively.

Language Modeling Language Modelling +2

HMD$^2$: Environment-aware Motion Generation from Single Egocentric Head-Mounted Device

no code implementations20 Sep 2024 Vladimir Guzov, Yifeng Jiang, Fangzhou Hong, Gerard Pons-Moll, Richard Newcombe, C. Karen Liu, Yuting Ye, Lingni Ma

This paper investigates the online generation of realistic full-body human motion using a single head-mounted device with an outward-facing color camera and the ability to perform visual SLAM.

Motion Generation

3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion

1 code implementation19 Sep 2024 Zhaoxi Chen, Jiaxiang Tang, Yuhao Dong, Ziang Cao, Fangzhou Hong, Yushi Lan, Tengfei Wang, Haozhe Xie, Tong Wu, Shunsuke Saito, Liang Pan, Dahua Lin, Ziwei Liu

The increasing demand for high-quality 3D assets across various industries necessitates efficient and automated 3D content creation.

Nymeria: A Massive Collection of Multimodal Egocentric Daily Motion in the Wild

no code implementations14 Jun 2024 Lingni Ma, Yuting Ye, Fangzhou Hong, Vladimir Guzov, Yifeng Jiang, Rowan Postyeni, Luis Pesqueira, Alexander Gamino, Vijay Baiyya, Hyo Jin Kim, Kevin Bailey, David Soriano Fosas, C. Karen Liu, Ziwei Liu, Jakob Engel, Renzo De Nardi, Richard Newcombe

To the best of our knowledge, Nymeria dataset is the world's largest collection of human motion in the wild; first of its kind to provide synchronized and localized multi-device multimodal egocentric data; and the world's largest motion-language dataset.

Action Recognition Gaze Estimation +1

GaussianCity: Generative Gaussian Splatting for Unbounded 3D City Generation

1 code implementation10 Jun 2024 Haozhe Xie, Zhaoxi Chen, Fangzhou Hong, Ziwei Liu

Recently 3D Gaussian Splatting (3D-GS) has emerged as a highly efficient alternative for object-level 3D generation.

3D Generation Scene Generation

4D Panoptic Scene Graph Generation

3 code implementations NeurIPS 2023 Jingkang Yang, Jun Cen, Wenxuan Peng, Shuai Liu, Fangzhou Hong, Xiangtai Li, Kaiyang Zhou, Qifeng Chen, Ziwei Liu

To facilitate research in this new area, we build a richly annotated PSG-4D dataset consisting of 3K RGB-D videos with a total of 1M frames, each of which is labeled with 4D panoptic segmentation masks as well as fine-grained, dynamic scene graphs.

4D Panoptic Segmentation Graph Generation +5

DiffTF++: 3D-aware Diffusion Transformer for Large-Vocabulary 3D Generation

1 code implementation13 May 2024 Ziang Cao, Fangzhou Hong, Tong Wu, Liang Pan, Ziwei Liu

Therefore, we introduce a diffusion-based feed-forward framework to address these challenges with a single model.

3D Generation Decoder +2

FashionEngine: Interactive 3D Human Generation and Editing via Multimodal Controls

no code implementations2 Apr 2024 Tao Hu, Fangzhou Hong, Zhaoxi Chen, Ziwei Liu

FashionEngine automates the 3D human production with three key components: 1) A pre-trained 3D human diffusion model that learns to model 3D humans in a semantic UV latent space from 2D image training data, which provides strong priors for diverse generation and editing tasks.

Virtual Try-on

StructLDM: Structured Latent Diffusion for 3D Human Generation

no code implementations1 Apr 2024 Tao Hu, Fangzhou Hong, Ziwei Liu

2) A structured 3D-aware auto-decoder that factorizes the global latent space into several semantic body parts parameterized by a set of conditional structured local NeRFs anchored to the body template, which embeds the properties learned from the 2D training data and can be decoded to render view-consistent humans under different poses and clothing styles.

Virtual Try-on

SurMo: Surface-based 4D Motion Modeling for Dynamic Human Rendering

no code implementations CVPR 2024 Tao Hu, Fangzhou Hong, Ziwei Liu

2) Physical motion decoding that is designed to encourage physical motion learning by decoding the motion triplane features at timestep t to predict both spatial derivatives and temporal derivatives at the next timestep t+1 in the training stage.

Generalizable Novel View Synthesis Novel View Synthesis

Large Motion Model for Unified Multi-Modal Motion Generation

no code implementations1 Apr 2024 Mingyuan Zhang, Daisheng Jin, Chenyang Gu, Fangzhou Hong, Zhongang Cai, Jingfang Huang, Chongzhi Zhang, Xinying Guo, Lei Yang, Ying He, Ziwei Liu

In this work, we present Large Motion Model (LMM), a motion-centric, multi-modal framework that unifies mainstream motion generation tasks into a generalist model.

Motion Generation

3DTopia: Large Text-to-3D Generation Model with Hybrid Diffusion Priors

1 code implementation4 Mar 2024 Fangzhou Hong, Jiaxiang Tang, Ziang Cao, Min Shi, Tong Wu, Zhaoxi Chen, Shuai Yang, Tengfei Wang, Liang Pan, Dahua Lin, Ziwei Liu

Specifically, it is powered by a text-conditioned tri-plane latent diffusion model, which quickly generates coarse 3D samples for fast prototyping.

3D Generation Text to 3D +1

Large-Vocabulary 3D Diffusion Model with Transformer

1 code implementation14 Sep 2023 Ziang Cao, Fangzhou Hong, Tong Wu, Liang Pan, Ziwei Liu

To this end, we propose a novel triplane-based 3D-aware Diffusion model with TransFormer, DiffTF, for handling challenges via three aspects.

3D Generation Diversity

DeformToon3D: Deformable 3D Toonification from Neural Radiance Fields

1 code implementation8 Sep 2023 Junzhe Zhang, Yushi Lan, Shuai Yang, Fangzhou Hong, Quan Wang, Chai Kiat Yeo, Ziwei Liu, Chen Change Loy

In this paper, we address the challenging problem of 3D toonification, which involves transferring the style of an artistic domain onto a target 3D face with stylized geometry and texture.

Decoder

CityDreamer: Compositional Generative Model of Unbounded 3D Cities

1 code implementation CVPR 2024 Haozhe Xie, Zhaoxi Chen, Fangzhou Hong, Ziwei Liu

3D city generation is a desirable yet challenging task, since humans are more sensitive to structural distortions in urban environments.

Ranked #2 on Scene Generation on GoogleEarth (KID metric)

Scene Generation

PointHPS: Cascaded 3D Human Pose and Shape Estimation from Point Clouds

no code implementations28 Aug 2023 Zhongang Cai, Liang Pan, Chen Wei, Wanqi Yin, Fangzhou Hong, Mingyuan Zhang, Chen Change Loy, Lei Yang, Ziwei Liu

To tackle these challenges, we propose a principled framework, PointHPS, for accurate 3D HPS from point clouds captured in real-world settings, which iteratively refines point features through a cascaded architecture.

3D human pose and shape estimation

HumanLiff: Layer-wise 3D Human Generation with Diffusion Model

no code implementations18 Aug 2023 Shoukang Hu, Fangzhou Hong, Tao Hu, Liang Pan, Haiyi Mei, Weiye Xiao, Lei Yang, Ziwei Liu

In this work, we propose HumanLiff, the first layer-wise 3D human generative model with a unified diffusion process.

3D Generation Neural Rendering

SHERF: Generalizable Human NeRF from a Single Image

1 code implementation ICCV 2023 Shoukang Hu, Fangzhou Hong, Liang Pan, Haiyi Mei, Lei Yang, Ziwei Liu

To this end, we propose a bank of 3D-aware hierarchical features, including global, point-level, and pixel-aligned features, to facilitate informative encoding.

3D Human Reconstruction

DeformToon3D: Deformable Neural Radiance Fields for 3D Toonification

no code implementations ICCV 2023 Junzhe Zhang, Yushi Lan, Shuai Yang, Fangzhou Hong, Quan Wang, Chai Kiat Yeo, Ziwei Liu, Chen Change Loy

In this paper, we address the challenging problem of 3D toonification, which involves transferring the style of an artistic domain onto a target 3D face with stylized geometry and texture.

Decoder

EVA3D: Compositional 3D Human Generation from 2D Image Collections

1 code implementation10 Oct 2022 Fangzhou Hong, Zhaoxi Chen, Yushi Lan, Liang Pan, Ziwei Liu

At the core of EVA3D is a compositional human NeRF representation, which divides the human body into local parts.

Diversity

MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model

2 code implementations31 Aug 2022 Mingyuan Zhang, Zhongang Cai, Liang Pan, Fangzhou Hong, Xinying Guo, Lei Yang, Ziwei Liu

Instead of a deterministic language-motion mapping, MotionDiffuse generates motions through a series of denoising steps in which variations are injected.

Denoising Motion Generation +1

AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars

1 code implementation17 May 2022 Fangzhou Hong, Mingyuan Zhang, Liang Pan, Zhongang Cai, Lei Yang, Ziwei Liu

Our key insight is to take advantage of the powerful vision-language model CLIP for supervising neural human generation, in terms of 3D geometry, texture and animation.

3D geometry Language Modelling +2

Versatile Multi-Modal Pre-Training for Human-Centric Perception

1 code implementation CVPR 2022 Fangzhou Hong, Liang Pan, Zhongang Cai, Ziwei Liu

To tackle the challenges, we design the novel Dense Intra-sample Contrastive Learning and Sparse Structure-aware Contrastive Learning targets by hierarchically learning a modal-invariant latent space featured with continuous and ordinal feature distribution and structure-aware semantic consistency.

Contrastive Learning Human Parsing +1

LiDAR-based 4D Panoptic Segmentation via Dynamic Shifting Network

1 code implementation14 Mar 2022 Fangzhou Hong, Hui Zhou, Xinge Zhu, Hongsheng Li, Ziwei Liu

In this work, we address the task of LiDAR-based panoptic segmentation, which aims to parse both objects and scenes in a unified manner.

4D Panoptic Segmentation Autonomous Driving +3

Garment4D: Garment Reconstruction from Point Cloud Sequences

1 code implementation NeurIPS 2021 Fangzhou Hong, Liang Pan, Zhongang Cai, Ziwei Liu

The main challenges are two-fold: 1) effective 3D feature learning for fine details, and 2) capture of garment dynamics caused by the interaction between garments and the human body, especially for loose garments like skirts.

Garment Reconstruction

LRC-Net: Learning Discriminative Features on Point Clouds by Encoding Local Region Contexts

no code implementations18 Mar 2020 Xinhai Liu, Zhizhong Han, Fangzhou Hong, Yu-Shen Liu, Matthias Zwicker

However, due to the irregularity and sparsity in sampled point clouds, it is hard to encode the fine-grained geometry of local regions and their spatial relationships when only using the fixed-size filters and individual local feature integration, which limit the ability to learn discriminative features.

Cannot find the paper you are looking for? You can Submit a new open access paper.