Search Results for author: Zilong Dong

Found 31 papers, 5 papers with code

CoProSketch: Controllable and Progressive Sketch Generation with Diffusion Model

no code implementations11 Apr 2025 Ruohao Zhan, Yijin Li, Yisheng He, Shuo Chen, Yichen Shen, Xinyu Chen, Zilong Dong, Zhaoyang Huang, Guofeng Zhang

We propose a novel framework CoProSketch, providing prominent controllability and details for sketch generation with diffusion models.

Image Generation

LHM: Large Animatable Human Reconstruction Model from a Single Image in Seconds

1 code implementation13 Mar 2025 Lingteng Qiu, Xiaodong Gu, Peihao Li, Qi Zuo, Weichao Shen, Junfei Zhang, Kejie Qiu, Weihao Yuan, GuanYing Chen, Zilong Dong, Liefeng Bo

Animatable 3D human reconstruction from a single image is a challenging problem due to the ambiguity in decoupling geometry, appearance, and deformation.

3D Human Reconstruction

LAM: Large Avatar Model for One-shot Animatable Gaussian Head

no code implementations25 Feb 2025 Yisheng He, Xiaodong Gu, Xiaodan Ye, Chao Xu, Zhengyi Zhao, Yuan Dong, Weihao Yuan, Zilong Dong, Liefeng Bo

The centerpiece of our framework is the canonical Gaussian attributes generator, which utilizes FLAME canonical points as queries.

MCMat: Multiview-Consistent and Physically Accurate PBR Material Generation

no code implementations18 Dec 2024 Shenhao Zhu, Lingteng Qiu, Xiaodong Gu, Zhengyi Zhao, Chao Xu, Yuxiao He, Zhe Li, Xiaoguang Han, Yao Yao, Xun Cao, Siyu Zhu, Weihao Yuan, Zilong Dong, Hao Zhu

In the generation stage, we adopt a Diffusion Transformer (DiT) model to generate PBR materials, where both the specially designed multi-branch DiT and reference-based DiT blocks adopt a global attention mechanism to promote feature interaction and fusion between different views, thereby improving multi-view consistency.

MulSMo: Multimodal Stylized Motion Generation by Bidirectional Control Flow

no code implementations13 Dec 2024 Zhe Li, Yisheng He, Lei Zhong, Weichao Shen, Qi Zuo, Lingteng Qiu, Zilong Dong, Laurence Tianruo Yang, Weihao Yuan

Generating motion sequences conforming to a target style while adhering to the given content prompts requires accommodating both the content and style.

Contrastive Learning Motion Generation

MVImgNet2.0: A Larger-scale Dataset of Multi-view Images

no code implementations2 Dec 2024 Xiaoguang Han, Yushuang Wu, Luyue Shi, Haolin Liu, Hongjie Liao, Lingteng Qiu, Weihao Yuan, Xiaodong Gu, Zilong Dong, Shuguang Cui

This paper constructs the MVImgNet2. 0 dataset that expands MVImgNet into a total of ~520k objects and 515 categories, which derives a 3D dataset with a larger scale that is more comparable to ones in the 2D domain.

3D Reconstruction Object Reconstruction

MoGenTS: Motion Generation based on Spatial-Temporal Joint Modeling

no code implementations26 Sep 2024 Weihao Yuan, Weichao Shen, Yisheng He, Yuan Dong, Xiaodong Gu, Zilong Dong, Liefeng Bo, QiXing Huang

Motion generation from discrete quantization offers many advantages over continuous regression, but at the cost of inevitable approximation errors.

Motion Generation Quantization

HIVE: HIerarchical Volume Encoding for Neural Implicit Surface Reconstruction

no code implementations3 Aug 2024 Xiaodong Gu, Weihao Yuan, Heng Li, Zilong Dong, Ping Tan

To better represent 3D shapes, we introduce a volume encoding to explicitly encode the spatial information.

Surface Reconstruction

StableNormal: Reducing Diffusion Variance for Stable and Sharp Normal

no code implementations24 Jun 2024 Chongjie Ye, Lingteng Qiu, Xiaodong Gu, Qi Zuo, Yushuang Wu, Zilong Dong, Liefeng Bo, Yuliang Xiu, Xiaoguang Han

The effectiveness of StableNormal is demonstrated through competitive performance in standard datasets such as DIODE-indoor, iBims, ScannetV2 and NYUv2, and also in various downstream tasks, such as surface reconstruction and normal enhancement.

Surface Normal Estimation Surface Reconstruction

GIC: Gaussian-Informed Continuum for Physical Property Identification and Simulation

no code implementations21 Jun 2024 Junhao Cai, Yuji Yang, Weihao Yuan, Yisheng He, Zilong Dong, Liefeng Bo, Hui Cheng, Qifeng Chen

To facilitate geometry-aware guidance in physical property estimation, we introduce a novel hybrid framework that leverages 3D Gaussian representation to not only capture explicit shapes but also enable the simulated continuum to render object masks as 2D shape surrogates during training.

Object

An Optimization Framework to Enforce Multi-View Consistency for Texturing 3D Meshes

no code implementations22 Mar 2024 Zhengyi Zhao, Chen Song, Xiaodong Gu, Yuan Dong, Qi Zuo, Weihao Yuan, Liefeng Bo, Zilong Dong, QiXing Huang

In particular, the third and fourth stages are iterated, with the cuts obtained in the fourth stage encouraging non-rigid alignment in the third stage to focus on regions close to the cuts.

OV9D: Open-Vocabulary Category-Level 9D Object Pose and Size Estimation

no code implementations19 Mar 2024 Junhao Cai, Yisheng He, Weihao Yuan, Siyu Zhu, Zilong Dong, Liefeng Bo, Qifeng Chen

Derived from OmniObject3D, OO3D-9D is the largest and most diverse dataset in the field of category-level object pose and size estimation.

Object

VideoMV: Consistent Multi-View Generation Based on Large Video Generative Model

no code implementations18 Mar 2024 Qi Zuo, Xiaodong Gu, Lingteng Qiu, Yuan Dong, Zhengyi Zhao, Weihao Yuan, Rui Peng, Siyu Zhu, Zilong Dong, Liefeng Bo, QiXing Huang

Images from video generative models are more suitable for multi-view generation because the underlying network architecture that generates them employs a temporal module to enforce frame consistency.

Denoising

Sketch2NeRF: Multi-view Sketch-guided Text-to-3D Generation

no code implementations25 Jan 2024 Minglin Chen, Weihao Yuan, Yukun Wang, Zhe Sheng, Yisheng He, Zilong Dong, Liefeng Bo, Yulan Guo

We propose a novel synchronized generation and reconstruction method to effectively optimize the NeRF.

3D Generation NeRF +1

IPoD: Implicit Field Learning with Point Diffusion for Generalizable 3D Object Reconstruction from Single RGB-D Images

no code implementations CVPR 2024 Yushuang Wu, Luyue Shi, Junhao Cai, Weihao Yuan, Lingteng Qiu, Zilong Dong, Liefeng Bo, Shuguang Cui, Xiaoguang Han

This approach treats the query points for implicit field learning as a noisy point cloud for iterative denoising allowing for their dynamic adaptation to the target object shape.

3D Object Reconstruction Denoising

GPLD3D: Latent Diffusion of 3D Shape Generative Models by Enforcing Geometric and Physical Priors

no code implementations CVPR 2024 Yuan Dong, Qi Zuo, Xiaodong Gu, Weihao Yuan, Zhengyi Zhao, Zilong Dong, Liefeng Bo, QiXing Huang

The key to our approach is a new diffusion procedure that combines the discrete empirical data distribution and a continuous distribution induced by the quality checker.

Denoising

RichDreamer: A Generalizable Normal-Depth Diffusion Model for Detail Richness in Text-to-3D

no code implementations CVPR 2024 Lingteng Qiu, GuanYing Chen, Xiaodong Gu, Qi Zuo, Mutian Xu, Yushuang Wu, Weihao Yuan, Zilong Dong, Liefeng Bo, Xiaoguang Han

Lifting 2D diffusion for 3D generation is a challenging problem due to the lack of geometric prior and the complex entanglement of materials and lighting in natural images.

3D Generation Text to 3D

Mirror-NeRF: Learning Neural Radiance Fields for Mirrors with Whitted-Style Ray Tracing

no code implementations7 Aug 2023 Junyi Zeng, Chong Bao, Rui Chen, Zilong Dong, Guofeng Zhang, Hujun Bao, Zhaopeng Cui

Recently, Neural Radiance Fields (NeRF) has exhibited significant success in novel view synthesis, surface reconstruction, etc.

NeRF Neural Rendering +2

Fine-grained Text-Video Retrieval with Frozen Image Encoders

no code implementations14 Jul 2023 Zuozhuo Dai, Fangtao Shao, Qingkun Su, Zilong Dong, Siyu Zhu

In the second stage, we propose a novel decoupled video text cross attention module to capture fine-grained multimodal information in spatial and temporal dimensions.

Decoder Retrieval +1

PanoContext-Former: Panoramic Total Scene Understanding with a Transformer

no code implementations CVPR 2024 Yuan Dong, Chuan Fang, Liefeng Bo, Zilong Dong, Ping Tan

Panoramic image enables deeper understanding and more holistic perception of $360^\circ$ surrounding environment, which can naturally encode enriched scene context information compared to standard perspective image.

3D Object Detection object-detection +1

3D Former: Monocular Scene Reconstruction with 3D SDF Transformers

1 code implementation31 Jan 2023 Weihao Yuan, Xiaodong Gu, Heng Li, Zilong Dong, Siyu Zhu

In this work, we propose an SDF transformer network, which replaces the role of 3D CNN for better 3D feature aggregation.

Dense RGB SLAM with Neural Implicit Maps

no code implementations21 Jan 2023 Heng Li, Xiaodong Gu, Weihao Yuan, Luwei Yang, Zilong Dong, Ping Tan

To reach this challenging goal without depth input, we introduce a hierarchical feature volume to facilitate the implicit map decoder.

Decoder Simultaneous Localization and Mapping

${S}^{2}$Net: Accurate Panorama Depth Estimation on Spherical Surface

no code implementations14 Jan 2023 Meng Li, Senbo Wang, Weihao Yuan, Weichao Shen, Zhe Sheng, Zilong Dong

In this paper, we propose an end-to-end deep network for monocular panorama depth estimation on a unit spherical surface.

Decoder Monocular Depth Estimation

RenderNet: Visual Relocalization Using Virtual Viewpoints in Large-Scale Indoor Environments

no code implementations26 Jul 2022 Jiahui Zhang, Shitao Tang, Kejie Qiu, Rui Huang, Chuan Fang, Le Cui, Zilong Dong, Siyu Zhu, Ping Tan

Visual relocalization has been a widely discussed problem in 3D vision: given a pre-constructed 3D visual map, the 6 DoF (Degrees-of-Freedom) pose of a query image is estimated.

Image Retrieval Retrieval +1

AR Mapping: Accurate and Efficient Mapping for Augmented Reality

no code implementations27 Mar 2021 Rui Huang, Chuan Fang, Kejie Qiu, Le Cui, Zilong Dong, Siyu Zhu, Ping Tan

Secondly, we propose an AR mapping pipeline which takes the input from the scanning device and produces accurate AR Maps.

DRO: Deep Recurrent Optimizer for Video to Depth

1 code implementation24 Mar 2021 Xiaodong Gu, Weihao Yuan, Zuozhuo Dai, Siyu Zhu, Chengzhou Tang, Zilong Dong, Ping Tan

There are increasing interests of studying the video-to-depth (V2D) problem with machine learning techniques.

ENFT: Efficient Non-Consecutive Feature Tracking for Robust Structure-from-Motion

3 code implementations27 Oct 2015 Guofeng Zhang, Hao-Min Liu, Zilong Dong, Jiaya Jia, Tien-Tsin Wong, Hujun Bao

Our framework consists of steps of solving the feature `dropout' problem when indistinctive structures, noise or large image distortion exists, and of rapidly recognizing and joining common features located in different subsequences.

Cannot find the paper you are looking for? You can Submit a new open access paper.