3 code implementations • 7 Jan 2025 • Nvidia, :, Niket Agarwal, Arslan Ali, Maciej Bala, Yogesh Balaji, Erik Barker, Tiffany Cai, Prithvijit Chattopadhyay, Yongxin Chen, Yin Cui, Yifan Ding, Daniel Dworakowski, Jiaojiao Fan, Michele Fenzi, Francesco Ferroni, Sanja Fidler, Dieter Fox, Songwei Ge, Yunhao Ge, Jinwei Gu, Siddharth Gururani, Ethan He, Jiahui Huang, Jacob Huffman, Pooya Jannaty, Jingyi Jin, Seung Wook Kim, Gergely Klár, Grace Lam, Shiyi Lan, Laura Leal-Taixe, Anqi Li, Zhaoshuo Li, Chen-Hsuan Lin, Tsung-Yi Lin, Huan Ling, Ming-Yu Liu, Xian Liu, Alice Luo, Qianli Ma, Hanzi Mao, Kaichun Mo, Arsalan Mousavian, Seungjun Nah, Sriharsha Niverty, David Page, Despoina Paschalidou, Zeeshan Patel, Lindsey Pavao, Morteza Ramezanali, Fitsum Reda, Xiaowei Ren, Vasanth Rao Naik Sabavat, Ed Schmerling, Stella Shi, Bartosz Stefaniak, Shitao Tang, Lyne Tchapmi, Przemek Tredak, Wei-Cheng Tseng, Jibin Varghese, Hao Wang, Haoxiang Wang, Heng Wang, Ting-Chun Wang, Fangyin Wei, Xinyue Wei, Jay Zhangjie Wu, Jiashu Xu, Wei Yang, Lin Yen-Chen, Xiaohui Zeng, Yu Zeng, Jing Zhang, Qinsheng Zhang, Yuxuan Zhang, Qingqing Zhao, Artur Zolkowski
We position a world foundation model as a general-purpose world model that can be fine-tuned into customized world models for downstream applications.
no code implementations • 17 Jun 2024 • Letian Wang, Seung Wook Kim, Jiawei Yang, Cunjun Yu, Boris Ivanovic, Steven L. Waslander, Yue Wang, Sanja Fidler, Marco Pavone, Peter Karkus
Our method is a generalizable feedforward model that predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs with limited view overlap, and is trained self-supervised with differentiable rendering to reconstruct RGB, depth, or feature images.
no code implementations • 14 Jun 2024 • Jiawei Ren, Kevin Xie, Ashkan Mirzaei, Hanxue Liang, Xiaohui Zeng, Karsten Kreis, Ziwei Liu, Antonio Torralba, Sanja Fidler, Seung Wook Kim, Huan Ling
We present L4GM, the first 4D Large Reconstruction Model that produces animated objects from a single-view video input -- in a single feed-forward pass that takes only a second.
no code implementations • 16 Apr 2024 • Ashkan Mirzaei, Riccardo de Lutio, Seung Wook Kim, David Acuna, Jonathan Kelly, Sanja Fidler, Igor Gilitschenski, Zan Gojcic
In this work, we propose an approach for 3D scene inpainting -- the task of coherently replacing parts of the reconstructed scene with desired content.
no code implementations • 22 Jan 2024 • Koichi Namekata, Amirmojtaba Sabour, Sanja Fidler, Seung Wook Kim
Diffusion models have recently received increasing research attention for their remarkable transfer abilities in semantic segmentation tasks.
no code implementations • CVPR 2024 • Huan Ling, Seung Wook Kim, Antonio Torralba, Sanja Fidler, Karsten Kreis
We also propose a motion amplification mechanism as well as a new autoregressive synthesis scheme to generate and combine multiple 4D sequences for longer generation.
no code implementations • 22 Nov 2023 • Katja Schwarz, Seung Wook Kim, Jun Gao, Sanja Fidler, Andreas Geiger, Karsten Kreis
Then, we train a diffusion model in the 3D-aware latent space, thereby enabling synthesis of high-quality 3D-consistent image samples, outperforming recent state-of-the-art GAN-based methods.
1 code implementation • 3 Nov 2023 • Jiawei Yang, Boris Ivanovic, Or Litany, Xinshuo Weng, Seung Wook Kim, Boyi Li, Tong Che, Danfei Xu, Sanja Fidler, Marco Pavone, Yue Wang
We present EmerNeRF, a simple yet powerful approach for learning spatial-temporal representations of dynamic driving scenes.
no code implementations • ICCV 2023 • Daiqing Li, Huan Ling, Amlan Kar, David Acuna, Seung Wook Kim, Karsten Kreis, Antonio Torralba, Sanja Fidler
In this work, we introduce a self-supervised feature representation learning framework DreamTeacher that utilizes generative networks for pre-training downstream image backbones.
no code implementations • CVPR 2023 • Seung Wook Kim, Bradley Brown, Kangxue Yin, Karsten Kreis, Katja Schwarz, Daiqing Li, Robin Rombach, Antonio Torralba, Sanja Fidler
We first train a scene auto-encoder to express a set of image and pose pairs as a neural field, represented as density and feature voxel grids that can be projected to produce novel views of the scene.
4 code implementations • CVPR 2023 • Andreas Blattmann, Robin Rombach, Huan Ling, Tim Dockhorn, Seung Wook Kim, Sanja Fidler, Karsten Kreis
We first pre-train an LDM on images only; then, we turn the image generator into a video generator by introducing a temporal dimension to the latent space diffusion model and fine-tuning on encoded image sequences, i. e., videos.
Ranked #5 on Text-to-Video Generation on MSR-VTT (CLIP-FID metric)
no code implementations • CVPR 2022 • Seung Wook Kim, Karsten Kreis, Daiqing Li, Antonio Torralba, Sanja Fidler
Modern image generative models show remarkable sample quality when trained on a single domain or class of objects.
Generative Adversarial Network Image-to-Image Translation +1
no code implementations • CVPR 2022 • Daiqing Li, Huan Ling, Seung Wook Kim, Karsten Kreis, Adela Barriuso, Sanja Fidler, Antonio Torralba
By training an effective feature segmentation architecture on top of BigGAN, we turn BigGAN into a labeled dataset generator.
1 code implementation • NeurIPS 2021 • Huan Ling, Karsten Kreis, Daiqing Li, Seung Wook Kim, Antonio Torralba, Sanja Fidler
EditGAN builds on a GAN framework that jointly models images and their semantic segmentations, requiring only a handful of labeled examples, making it a scalable tool for editing.
no code implementations • 29 Sep 2021 • Seung Wook Kim, Juhong Min, Minsu Cho
Establishing correspondences between images remains a challenging task, especially under large appearance changes due to different viewpoints and intra-class variations.
no code implementations • CVPR 2021 • Seung Wook Kim, Jonah Philion, Antonio Torralba, Sanja Fidler
Realistic simulators are critical for training and verifying robotics systems.
2 code implementations • 7 Feb 2021 • Chetan L. Srinidhi, Seung Wook Kim, Fu-Der Chen, Anne L. Martel
In this work, we overcome this challenge by leveraging both task-agnostic and task-specific unlabeled data based on two novel strategies: i) a self-supervised pretext task that harnesses the underlying multi-resolution contextual cues in histology whole-slide images to learn a powerful supervisory signal for unsupervised representation learning; ii) a new teacher-student semi-supervised consistency paradigm that learns to effectively transfer the pretrained representations to downstream tasks based on prediction consistency with the task-specific un-labeled data.
Histopathological Image Classification Representation Learning +1
no code implementations • NeurIPS 2020 • Huan Ling, David Acuna, Karsten Kreis, Seung Wook Kim, Sanja Fidler
In images of complex scenes, objects are often occluding each other which makes perception tasks such as object detection and tracking, or robotic control tasks such as planning, challenging.
no code implementations • 17 Aug 2020 • Seung Wook Kim, Keunsoo Ko, Haneul Ko, Victor C. M. Leung
In an EODF, AVs extract the region of interests~(RoIs) of the captured image when the channel quality is not sufficiently good for supporting real-time OD.
no code implementations • CVPR 2020 • Seung Wook Kim, Yuhao Zhou, Jonah Philion, Antonio Torralba, Sanja Fidler
Simulation is a crucial component of any robotic system.
1 code implementation • 30 Dec 2019 • Atef Chaudhury, Makarand Tapaswi, Seung Wook Kim, Sanja Fidler
Understanding stories is a challenging reading comprehension problem for machines as it requires reading a large volume of text and following long-range dependencies.
no code implementations • 3 Oct 2018 • Sungeun Hong, Wonjin Jung, Ilsang Woo, Seung Wook Kim
Over the past decade, there has been a growing interest in human pose estimation.
1 code implementation • ICLR 2019 • Seung Wook Kim, Makarand Tapaswi, Sanja Fidler
Thus, a module for a new task learns to query existing modules and composes their outputs in order to produce its own output.