Search Results for author: Guocheng Qian

Found 27 papers, 12 papers with code

I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models

no code implementations12 Feb 2025 Zhenxing Mi, Kuan-Chieh Wang, Guocheng Qian, Hanrong Ye, Runtao Liu, Sergey Tulyakov, Kfir Aberman, Dan Xu

Without complex training and datasets, ThinkDiff effectively unleashes understanding, reasoning, and composing capabilities in diffusion models.

Decoder Large Language Model

Wonderland: Navigating 3D Scenes from a Single Image

no code implementations CVPR 2025 Hanwen Liang, Junli Cao, Vidit Goel, Guocheng Qian, Sergei Korolev, Demetri Terzopoulos, Konstantinos N. Plataniotis, Sergey Tulyakov, Jian Ren

Specifically, we introduce a large-scale reconstruction model that uses latents from a video diffusion model to predict 3D Gaussian Splattings for the scenes in a feed-forward manner.

3D Reconstruction Scene Generation

Omni-ID: Holistic Identity Representation Designed for Generative Tasks

no code implementations CVPR 2025 Guocheng Qian, Kuan-Chieh Wang, Or Patashnik, Negin Heravi, Daniil Ostashev, Sergey Tulyakov, Daniel Cohen-Or, Kfir Aberman

Our approach uses a few-to-many identity reconstruction training paradigm, where a limited set of input images is used to reconstruct multiple target images of the same individual in various poses and expressions.

Decoder

AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers

no code implementations CVPR 2025 Sherwin Bahmani, Ivan Skorokhodov, Guocheng Qian, Aliaksandr Siarohin, Willi Menapace, Andrea Tagliasacchi, David B. Lindell, Sergey Tulyakov

This suggested us to limit the injection of camera conditioning to a subset of the architecture to prevent interference with other video features, leading to a 4x reduction of training parameters, improved training speed, and 10% higher visual quality.

Camera Pose Estimation Pose Estimation +1

FastPCI: Motion-Structure Guided Fast Point Cloud Frame Interpolation

1 code implementation25 Oct 2024 Tianyu Zhang, Guocheng Qian, Jin Xie, Jian Yang

Point cloud frame interpolation is a challenging task that involves accurate scene flow estimation across frames and maintaining the geometry structure.

Scene Flow Estimation

TrackNeRF: Bundle Adjusting NeRF from Sparse and Noisy Views via Feature Tracks

no code implementations20 Aug 2024 Jinjie Mai, Wenxuan Zhu, Sara Rojas, Jesus Zarzar, Abdullah Hamdi, Guocheng Qian, Bing Li, Silvio Giancola, Bernard Ghanem

Neural radiance fields (NeRFs) generally require many images with accurate poses for accurate novel view synthesis, which does not reflect realistic setups where views can be sparse and poses can be noisy.

NeRF Novel View Synthesis

VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control

no code implementations17 Jul 2024 Sherwin Bahmani, Ivan Skorokhodov, Aliaksandr Siarohin, Willi Menapace, Guocheng Qian, Michael Vasilkovsky, Hsin-Ying Lee, Chaoyang Wang, Jiaxu Zou, Andrea Tagliasacchi, David B. Lindell, Sergey Tulyakov

Recently, new methods demonstrate the ability to generate videos with controllable camera poses these techniques leverage pre-trained U-Net-based diffusion models that explicitly disentangle spatial and temporal generation.

Video Generation

GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering

1 code implementation15 Feb 2024 Abdullah Hamdi, Luke Melas-Kyriazi, Jinjie Mai, Guocheng Qian, Ruoshi Liu, Carl Vondrick, Bernard Ghanem, Andrea Vedaldi

With the aid of a frequency-modulated loss, GES achieves competitive performance in novel-view synthesis benchmarks while requiring less than half the memory storage of Gaussian Splatting and increasing the rendering speed by up to 39%.

3D Reconstruction Novel View Synthesis

Dr$^2$Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning

1 code implementation8 Jan 2024 Chen Zhao, Shuming Liu, Karttikeya Mangalam, Guocheng Qian, Fatimah Zohra, Abdulmohsen Alghannam, Jitendra Malik, Bernard Ghanem

We use two coefficients on either type of residual connections respectively, and introduce a dynamic training strategy that seamlessly transitions the pretrained model to a reversible network with much higher numerical precision.

object-detection Small Object Detection +1

Dr2Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning

no code implementations CVPR 2024 Chen Zhao, Shuming Liu, Karttikeya Mangalam, Guocheng Qian, Fatimah Zohra, Abdulmohsen Alghannam, Jitendra Malik, Bernard Ghanem

We use two coefficients on either type of residual connections respectively and introduce a dynamic training strategy that seamlessly transitions the pretrained model to a reversible network with much higher numerical precision.

object-detection Small Object Detection +1

GES : Generalized Exponential Splatting for Efficient Radiance Field Rendering

no code implementations CVPR 2024 Abdullah Hamdi, Luke Melas-Kyriazi, Jinjie Mai, Guocheng Qian, Ruoshi Liu, Carl Vondrick, Bernard Ghanem, Andrea Vedaldi

With the aid of a frequency-modulated loss GES achieves competitive performance in novel-view synthesis benchmarks while requiring less than half the memory storage of Gaussian Splatting and increasing the rendering speed by up to 39%.

3D Reconstruction Novel View Synthesis

Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors

1 code implementation30 Jun 2023 Guocheng Qian, Jinjie Mai, Abdullah Hamdi, Jian Ren, Aliaksandr Siarohin, Bing Li, Hsin-Ying Lee, Ivan Skorokhodov, Peter Wonka, Sergey Tulyakov, Bernard Ghanem

We present Magic123, a two-stage coarse-to-fine approach for high-quality, textured 3D meshes generation from a single unposed image in the wild using both2D and 3D priors.

Image to 3D

Exploring Open-Vocabulary Semantic Segmentation without Human Labels

no code implementations1 Jun 2023 Jun Chen, Deyao Zhu, Guocheng Qian, Bernard Ghanem, Zhicheng Yan, Chenchen Zhu, Fanyi Xiao, Mohamed Elhoseiny, Sean Chang Culatana

Although acquired extensive knowledge of visual concepts, it is non-trivial to exploit knowledge from these VL models to the task of semantic segmentation, as they are usually trained at an image level.

Open Vocabulary Semantic Segmentation Open-Vocabulary Semantic Segmentation +3

LLM as A Robotic Brain: Unifying Egocentric Memory and Control

no code implementations19 Apr 2023 Jinjie Mai, Jun Chen, Bing Li, Guocheng Qian, Mohamed Elhoseiny, Bernard Ghanem

In this paper, we propose a novel and generalizable framework called LLM-Brain: using Large-scale Language Model as a robotic brain to unify egocentric memory and control.

Embodied Question Answering Language Modeling +3

Pix4Point: Image Pretrained Standard Transformers for 3D Point Cloud Understanding

1 code implementation25 Aug 2022 Guocheng Qian, Abdullah Hamdi, Xingdi Zhang, Bernard Ghanem

Pretrained on a large number of widely available images, significant gains of PViT are observed in the tasks of 3D point cloud classification, part segmentation, and semantic segmentation on ScanObjectNN, ShapeNetPart, and S3DIS, respectively.

3D Point Cloud Classification Inductive Bias +2

ASSANet: An Anisotropic Separable Set Abstraction for Efficient Point Cloud Representation Learning

1 code implementation NeurIPS 2021 Guocheng Qian, Hasan Abed Al Kader Hammoud, Guohao Li, Ali Thabet, Bernard Ghanem

We then introduce a new Anisotropic Reduction function into our Separable SA module and propose an Anisotropic Separable SA (ASSA) module that substantially increases the network's accuracy.

3D Part Segmentation 3D Point Cloud Classification +3

DeepGCNs: Making GCNs Go as Deep as CNNs

4 code implementations15 Oct 2019 Guohao Li, Matthias Müller, Guocheng Qian, Itzel C. Delgadillo, Abdulellah Abualshour, Ali Thabet, Bernard Ghanem

This work transfers concepts such as residual/dense connections and dilated convolutions from CNNs to GCNs in order to successfully train very deep GCNs.

3D Point Cloud Classification 3D Semantic Segmentation +2

Rethinking Learning-based Demosaicing, Denoising, and Super-Resolution Pipeline

1 code implementation7 May 2019 Guocheng Qian, Yuanhao Wang, Jinjin Gu, Chao Dong, Wolfgang Heidrich, Bernard Ghanem, Jimmy S. Ren

In this work, we comprehensively study the effects of pipelines on the mixture problem of learning-based DN, DM, and SR, in both sequential and joint solutions.

Demosaicking Denoising +1

Cannot find the paper you are looking for? You can Submit a new open access paper.