1 code implementation • 10 Jun 2025 • Xuanchi Ren, Yifan Lu, Tianshi Cao, Ruiyuan Gao, Shengyu Huang, Amirmojtaba Sabour, Tianchang Shen, Tobias Pfaff, Jay Zhangjie Wu, Runjian Chen, Seung Wook Kim, Jun Gao, Laura Leal-Taixe, Mike Chen, Sanja Fidler, Huan Ling
To address this challenge, we introduce the Cosmos-Drive-Dreams - a synthetic data generation (SDG) pipeline that aims to generate challenging scenarios to facilitate downstream tasks such as perception and driving policy training.
2 code implementations • 18 Mar 2025 • Nvidia, :, Hassan Abu Alhaija, Jose Alvarez, Maciej Bala, Tiffany Cai, Tianshi Cao, Liz Cha, Joshua Chen, Mike Chen, Francesco Ferroni, Sanja Fidler, Dieter Fox, Yunhao Ge, Jinwei Gu, Ali Hassani, Michael Isaev, Pooya Jannaty, Shiyi Lan, Tobias Lasser, Huan Ling, Ming-Yu Liu, Xian Liu, Yifan Lu, Alice Luo, Qianli Ma, Hanzi Mao, Fabio Ramos, Xuanchi Ren, Tianchang Shen, Xinglong Sun, Shitao Tang, Ting-Chun Wang, Jay Wu, Jiashu Xu, Stella Xu, Kevin Xie, Yuchong Ye, Xiaodong Yang, Xiaohui Zeng, Yu Zeng
We introduce Cosmos-Transfer, a conditional world generation model that can generate world simulations based on multiple spatial control inputs of various modalities such as segmentation, depth, and edge.
1 code implementation • CVPR 2025 • Xuanchi Ren, Tianchang Shen, Jiahui Huang, Huan Ling, Yifan Lu, Merlin Nimier-David, Thomas Müller, Alexander Keller, Sanja Fidler, Jun Gao
Our results demonstrate more precise camera control than prior work, as well as state-of-the-art results in sparse-view novel view synthesis, even in challenging settings such as driving scenes and monocular dynamic video.
no code implementations • CVPR 2025 • Jay Zhangjie Wu, Yuxuan Zhang, Haithem Turki, Xuanchi Ren, Jun Gao, Mike Zheng Shou, Sanja Fidler, Zan Gojcic, Huan Ling
At the core of our approach is Difix, a single-step image diffusion model trained to enhance and remove artifacts in rendered novel views caused by underconstrained regions of the 3D representation.
no code implementations • 5 Dec 2024 • Yifan Lu, Xuanchi Ren, Jiawei Yang, Tianchang Shen, Zhangjie Wu, Jun Gao, Yue Wang, Siheng Chen, Mike Chen, Sanja Fidler, Jiahui Huang
We present InfiniCube, a scalable method for generating unbounded dynamic 3D driving scenes with high fidelity and controllability.
no code implementations • 26 Oct 2024 • Xuanchi Ren, Yifan Lu, Hanxue Liang, Zhangjie Wu, Huan Ling, Mike Chen, Sanja Fidler, Francis Williams, Jiahui Huang
We present SCube, a novel method for reconstructing large-scale 3D scenes (geometry, appearance, and semantics) from a sparse set of posed images.
1 code implementation • 1 Jul 2024 • Francis Williams, Jiahui Huang, Jonathan Swartz, Gergely Klár, Vijay Thakkar, Matthew Cong, Xuanchi Ren, RuiLong Li, Clement Fuji-Tsang, Sanja Fidler, Eftychios Sifakis, Ken Museth
We present fVDB, a novel GPU-optimized framework for deep learning on large-scale 3D data.
1 code implementation • CVPR 2024 • Xuanchi Ren, Jiahui Huang, Xiaohui Zeng, Ken Museth, Sanja Fidler, Francis Williams
We present XCube (abbreviated as $\mathcal{X}^3$), a novel generative model for high-resolution sparse 3D voxel grids with arbitrary attributes.
1 code implementation • CVPR 2023 • Chenyang Lei, Xuanchi Ren, Zhaoxiang Zhang, Qifeng Chen
Prior work usually requires specific guidance such as the flickering frequency, manual annotations, or extra consistent videos to remove the flicker.
1 code implementation • CVPR 2022 • Xuanchi Ren, Xiaolong Wang
Novel view synthesis from a single image has recently attracted a lot of attention, and it has been primarily advanced by 3D deep learning and rendering techniques.
2 code implementations • ICLR 2022 • Dacheng Yin, Xuanchi Ren, Chong Luo, Yuwang Wang, Zhiwei Xiong, Wenjun Zeng
Last, an innovative link attention module serves as the decoder to reconstruct data from the decomposed content and style, with the help of the linking keys.
1 code implementation • ICCV 2021 • Xuanchi Ren, Tao Yang, Li Erran Li, Alexandre Alahi, Qifeng Chen
The ability to predict unseen vehicles is critical for safety in autonomous driving.
2 code implementations • ICLR 2022 • Xuanchi Ren, Tao Yang, Yuwang Wang, Wenjun Zeng
Based on this observation, we argue that it is possible to mitigate the trade-off by $(i)$ leveraging the pretrained generative models with high generation quality, $(ii)$ focusing on discovering the traversal directions as factors for disentangled representation learning.
1 code implementation • 21 Feb 2021 • Xuanchi Ren, Tao Yang, Yuwang Wang, Wenjun Zeng
From the unsupervised disentanglement perspective, we rethink content and style and propose a formulation for unsupervised C-S disentanglement based on our assumption that different factors are of different importance and popularity for image reconstruction, which serves as a data bias.
1 code implementation • ICLR 2022 • Tao Yang, Xuanchi Ren, Yuwang Wang, Wenjun Zeng, Nanning Zheng
We then propose a model, based on existing VAE-based methods, to tackle the unsupervised learning problem of the framework.
1 code implementation • 9 Dec 2020 • Xuanchi Ren, Zian Qian, Qifeng Chen
Our key observation is that some frames in a video with motion blur are much sharper than others, and thus we can transfer the texture information in those sharp frames to blurry frames.
1 code implementation • 13 Dec 2019 • Xuanchi Ren, Haoran Li, Zijian Huang, Qifeng Chen
We present a learning-based approach with pose perceptual loss for automatic music video generation.