no code implementations • 12 Feb 2025 • Zhenxing Mi, Kuan-Chieh Wang, Guocheng Qian, Hanrong Ye, Runtao Liu, Sergey Tulyakov, Kfir Aberman, Dan Xu
Without complex training and datasets, ThinkDiff effectively unleashes understanding, reasoning, and composing capabilities in diffusion models.
no code implementations • CVPR 2025 • Hanwen Liang, Junli Cao, Vidit Goel, Guocheng Qian, Sergei Korolev, Demetri Terzopoulos, Konstantinos N. Plataniotis, Sergey Tulyakov, Jian Ren
Specifically, we introduce a large-scale reconstruction model that uses latents from a video diffusion model to predict 3D Gaussian Splattings for the scenes in a feed-forward manner.
no code implementations • CVPR 2025 • Guocheng Qian, Kuan-Chieh Wang, Or Patashnik, Negin Heravi, Daniil Ostashev, Sergey Tulyakov, Daniel Cohen-Or, Kfir Aberman
Our approach uses a few-to-many identity reconstruction training paradigm, where a limited set of input images is used to reconstruct multiple target images of the same individual in various poses and expressions.
no code implementations • CVPR 2025 • Sherwin Bahmani, Ivan Skorokhodov, Guocheng Qian, Aliaksandr Siarohin, Willi Menapace, Andrea Tagliasacchi, David B. Lindell, Sergey Tulyakov
This suggested us to limit the injection of camera conditioning to a subset of the architecture to prevent interference with other video features, leading to a 4x reduction of training parameters, improved training speed, and 10% higher visual quality.
1 code implementation • 25 Oct 2024 • Tianyu Zhang, Guocheng Qian, Jin Xie, Jian Yang
Point cloud frame interpolation is a challenging task that involves accurate scene flow estimation across frames and maintaining the geometry structure.
no code implementations • 20 Aug 2024 • Jinjie Mai, Wenxuan Zhu, Sara Rojas, Jesus Zarzar, Abdullah Hamdi, Guocheng Qian, Bing Li, Silvio Giancola, Bernard Ghanem
Neural radiance fields (NeRFs) generally require many images with accurate poses for accurate novel view synthesis, which does not reflect realistic setups where views can be sparse and poses can be noisy.
no code implementations • 17 Jul 2024 • Sherwin Bahmani, Ivan Skorokhodov, Aliaksandr Siarohin, Willi Menapace, Guocheng Qian, Michael Vasilkovsky, Hsin-Ying Lee, Chaoyang Wang, Jiaxu Zou, Andrea Tagliasacchi, David B. Lindell, Sergey Tulyakov
Recently, new methods demonstrate the ability to generate videos with controllable camera poses these techniques leverage pre-trained U-Net-based diffusion models that explicitly disentangle spatial and temporal generation.
1 code implementation • 15 Feb 2024 • Abdullah Hamdi, Luke Melas-Kyriazi, Jinjie Mai, Guocheng Qian, Ruoshi Liu, Carl Vondrick, Bernard Ghanem, Andrea Vedaldi
With the aid of a frequency-modulated loss, GES achieves competitive performance in novel-view synthesis benchmarks while requiring less than half the memory storage of Gaussian Splatting and increasing the rendering speed by up to 39%.
no code implementations • 7 Feb 2024 • Yash Kant, Ziyi Wu, Michael Vasilkovsky, Guocheng Qian, Jian Ren, Riza Alp Guler, Bernard Ghanem, Sergey Tulyakov, Igor Gilitschenski, Aliaksandr Siarohin
We present SPAD, a novel approach for creating consistent multi-view images from text prompts or single images.
no code implementations • 1 Feb 2024 • Guocheng Qian, Junli Cao, Aliaksandr Siarohin, Yash Kant, Chaoyang Wang, Michael Vasilkovsky, Hsin-Ying Lee, Yuwei Fang, Ivan Skorokhodov, Peiye Zhuang, Igor Gilitschenski, Jian Ren, Bernard Ghanem, Kfir Aberman, Sergey Tulyakov
We introduce Amortized Text-to-Mesh (AToM), a feed-forward text-to-mesh framework optimized across multiple text prompts simultaneously.
no code implementations • 10 Jan 2024 • Chaoyang Wang, Peiye Zhuang, Aliaksandr Siarohin, Junli Cao, Guocheng Qian, Hsin-Ying Lee, Sergey Tulyakov
Dynamic novel view synthesis aims to capture the temporal evolution of visual content within videos.
1 code implementation • 8 Jan 2024 • Chen Zhao, Shuming Liu, Karttikeya Mangalam, Guocheng Qian, Fatimah Zohra, Abdulmohsen Alghannam, Jitendra Malik, Bernard Ghanem
We use two coefficients on either type of residual connections respectively, and introduce a dynamic training strategy that seamlessly transitions the pretrained model to a reversible network with much higher numerical precision.
no code implementations • CVPR 2024 • Chen Zhao, Shuming Liu, Karttikeya Mangalam, Guocheng Qian, Fatimah Zohra, Abdulmohsen Alghannam, Jitendra Malik, Bernard Ghanem
We use two coefficients on either type of residual connections respectively and introduce a dynamic training strategy that seamlessly transitions the pretrained model to a reversible network with much higher numerical precision.
no code implementations • CVPR 2024 • Abdullah Hamdi, Luke Melas-Kyriazi, Jinjie Mai, Guocheng Qian, Ruoshi Liu, Carl Vondrick, Bernard Ghanem, Andrea Vedaldi
With the aid of a frequency-modulated loss GES achieves competitive performance in novel-view synthesis benchmarks while requiring less than half the memory storage of Gaussian Splatting and increasing the rendering speed by up to 39%.
no code implementations • CVPR 2024 • Yash Kant, Aliaksandr Siarohin, Ziyi Wu, Michael Vasilkovsky, Guocheng Qian, Jian Ren, Riza Alp Guler, Bernard Ghanem, Sergey Tulyakov, Igor Gilitschenski
We present SPAD a novel approach for creating consistent multi-view images from text prompts or single images.
1 code implementation • 30 Jun 2023 • Guocheng Qian, Jinjie Mai, Abdullah Hamdi, Jian Ren, Aliaksandr Siarohin, Bing Li, Hsin-Ying Lee, Ivan Skorokhodov, Peter Wonka, Sergey Tulyakov, Bernard Ghanem
We present Magic123, a two-stage coarse-to-fine approach for high-quality, textured 3D meshes generation from a single unposed image in the wild using both2D and 3D priors.
no code implementations • 1 Jun 2023 • Jun Chen, Deyao Zhu, Guocheng Qian, Bernard Ghanem, Zhicheng Yan, Chenchen Zhu, Fanyi Xiao, Mohamed Elhoseiny, Sean Chang Culatana
Although acquired extensive knowledge of visual concepts, it is non-trivial to exploit knowledge from these VL models to the task of semantic segmentation, as they are usually trained at an image level.
Open Vocabulary Semantic Segmentation
Open-Vocabulary Semantic Segmentation
+3
no code implementations • 19 Apr 2023 • Jinjie Mai, Jun Chen, Bing Li, Guocheng Qian, Mohamed Elhoseiny, Bernard Ghanem
In this paper, we propose a novel and generalizable framework called LLM-Brain: using Large-scale Language Model as a robotic brain to unify egocentric memory and control.
no code implementations • ICCV 2023 • Jun Chen, Deyao Zhu, Guocheng Qian, Bernard Ghanem, Zhicheng Yan, Chenchen Zhu, Fanyi Xiao, Sean Chang Culatana, Mohamed Elhoseiny
Semantic segmentation is a crucial task in computer vision that involves segmenting images into semantically meaningful regions at the pixel level.
Open Vocabulary Semantic Segmentation
Open-Vocabulary Semantic Segmentation
+3
1 code implementation • 25 Aug 2022 • Guocheng Qian, Abdullah Hamdi, Xingdi Zhang, Bernard Ghanem
Pretrained on a large number of widely available images, significant gains of PViT are observed in the tasks of 3D point cloud classification, part segmentation, and semantic segmentation on ScanObjectNN, ShapeNetPart, and S3DIS, respectively.
3 code implementations • 9 Jun 2022 • Guocheng Qian, Yuchen Li, Houwen Peng, Jinjie Mai, Hasan Abed Al Kader Hammoud, Mohamed Elhoseiny, Bernard Ghanem
In this work, we revisit the classical PointNet++ through a systematic study of model training and scaling strategies, and offer two major contributions.
Ranked #3 on
3D Semantic Segmentation
on OpenTrench3D
1 code implementation • 11 Apr 2022 • Guocheng Qian, Xuanyang Zhang, Guohao Li, Chen Zhao, Yukang Chen, Xiangyu Zhang, Bernard Ghanem, Jian Sun
TNAS performs a modified bi-level Breadth-First Search in the proposed trees to discover a high-performance architecture.
1 code implementation • NeurIPS 2021 • Guocheng Qian, Hasan Abed Al Kader Hammoud, Guohao Li, Ali Thabet, Bernard Ghanem
We then introduce a new Anisotropic Reduction function into our Separable SA module and propose an Anisotropic Separable SA (ASSA) module that substantially increases the network's accuracy.
Ranked #36 on
3D Part Segmentation
on ShapeNet-Part
1 code implementation • CVPR 2021 • Guocheng Qian, Abdulellah Abualshour, Guohao Li, Ali Thabet, Bernard Ghanem
We combine Inception DenseGCN with NodeShuffle into a new point upsampling pipeline called PU-GCN.
1 code implementation • CVPR 2020 • Guohao Li, Guocheng Qian, Itzel C. Delgadillo, Matthias Müller, Ali Thabet, Bernard Ghanem
Architecture design has become a crucial component of successful deep learning.
Ranked #4 on
Node Classification
on PPI
4 code implementations • 15 Oct 2019 • Guohao Li, Matthias Müller, Guocheng Qian, Itzel C. Delgadillo, Abdulellah Abualshour, Ali Thabet, Bernard Ghanem
This work transfers concepts such as residual/dense connections and dilated convolutions from CNNs to GCNs in order to successfully train very deep GCNs.
Ranked #5 on
3D Semantic Segmentation
on PartNet
1 code implementation • 7 May 2019 • Guocheng Qian, Yuanhao Wang, Jinjin Gu, Chao Dong, Wolfgang Heidrich, Bernard Ghanem, Jimmy S. Ren
In this work, we comprehensively study the effects of pipelines on the mixture problem of learning-based DN, DM, and SR, in both sequential and joint solutions.