Search Results for author: Sanja Fidler

Found 225 papers, 72 papers with code

UniRelight: Learning Joint Decomposition and Synthesis for Video Relighting

no code implementations18 Jun 2025 Kai He, Ruofan Liang, Jacob Munkberg, Jon Hasselgren, Nandita Vijaykumar, Alexander Keller, Sanja Fidler, Igor Gilitschenski, Zan Gojcic, Zian Wang

We address the challenge of relighting a single image or video, a task that demands precise scene intrinsic understanding and high-quality light transport synthesis.

Align Your Flow: Scaling Continuous-Time Flow Map Distillation

no code implementations17 Jun 2025 Amirmojtaba Sabour, Sanja Fidler, Karsten Kreis

Diffusion- and flow-based models have emerged as state-of-the-art generative modeling approaches, but they require many sampling steps.

Image Generation

Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models

1 code implementation10 Jun 2025 Xuanchi Ren, Yifan Lu, Tianshi Cao, Ruiyuan Gao, Shengyu Huang, Amirmojtaba Sabour, Tianchang Shen, Tobias Pfaff, Jay Zhangjie Wu, Runjian Chen, Seung Wook Kim, Jun Gao, Laura Leal-Taixe, Mike Chen, Sanja Fidler, Huan Ling

To address this challenge, we introduce the Cosmos-Drive-Dreams - a synthetic data generation (SDG) pipeline that aims to generate challenging scenarios to facilitate downstream tasks such as perception and driving policy training.

3D Lane Detection 3D Object Detection +3

Socratic-MCTS: Test-Time Visual Reasoning by Asking the Right Questions

no code implementations10 Jun 2025 David Acuna, Ximing Lu, JaeHun Jung, Hyunwoo Kim, Amlan Kar, Sanja Fidler, Yejin Choi

Recent research in vision-language models (VLMs) has centered around the possibility of equipping them with implicit long-form chain-of-thought reasoning -- akin to the success observed in language models -- via distillation and reinforcement learning.

Visual Reasoning

LongPerceptualThoughts: Distilling System-2 Reasoning for System-1 Perception

no code implementations21 Apr 2025 Yuan-Hong Liao, Sven Elflein, Liu He, Laura Leal-Taixé, Yejin Choi, Sanja Fidler, David Acuna

Recent reasoning models through test-time scaling have demonstrated that long chain-of-thoughts can unlock substantial performance boosts in hard reasoning tasks such as math and code.

Math MMLU +2

VideoPanda: Video Panoramic Diffusion with Multi-view Attention

no code implementations15 Apr 2025 Kevin Xie, Amirmojtaba Sabour, Jiahui Huang, Despoina Paschalidou, Greg Klar, Umar Iqbal, Sanja Fidler, Xiaohui Zeng

High resolution panoramic video content is paramount for immersive experiences in Virtual Reality, but is non-trivial to collect as it requires specialized equipment and intricate camera setups.

Video Generation

PARTFIELD: Learning 3D Feature Fields for Part Segmentation and Beyond

no code implementations15 Apr 2025 Minghua Liu, Mikaela Angelina Uy, Donglai Xiang, Hao Su, Sanja Fidler, Nicholas Sharp, Jun Gao

We propose PartField, a feedforward approach for learning part-based 3D features, which captures the general concept of parts and their hierarchy without relying on predefined templates or text-based names, and can be applied to open-world 3D shapes across various modalities.

Contrastive Learning

GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control

1 code implementation CVPR 2025 Xuanchi Ren, Tianchang Shen, Jiahui Huang, Huan Ling, Yifan Lu, Merlin Nimier-David, Thomas Müller, Alexander Keller, Sanja Fidler, Jun Gao

Our results demonstrate more precise camera control than prior work, as well as state-of-the-art results in sparse-view novel view synthesis, even in challenging settings such as driving scenes and monocular dynamic video.

Novel View Synthesis Video Generation

Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models

no code implementations CVPR 2025 Jay Zhangjie Wu, Yuxuan Zhang, Haithem Turki, Xuanchi Ren, Jun Gao, Mike Zheng Shou, Sanja Fidler, Zan Gojcic, Huan Ling

At the core of our approach is Difix, a single-step image diffusion model trained to enhance and remove artifacts in rendered novel views caused by underconstrained regions of the 3D representation.

3DGS 3D Reconstruction +2

DiffusionRenderer: Neural Inverse and Forward Rendering with Video Diffusion Models

no code implementations30 Jan 2025 Ruofan Liang, Zan Gojcic, Huan Ling, Jacob Munkberg, Jon Hasselgren, Zhi-Hao Lin, Jun Gao, Alexander Keller, Nandita Vijaykumar, Sanja Fidler, Zian Wang

Classic physically-based rendering (PBR) accurately simulates the light transport, but relies on precise scene representations--explicit 3D geometry, high-quality material properties, and lighting conditions--that are often impractical to obtain in real-world scenarios.

3D geometry Inverse Rendering

Can Large Vision-Language Models Correct Semantic Grounding Errors By Themselves?

no code implementations CVPR 2025 Yuan-Hong Liao, Rafid Mahmood, Sanja Fidler, David Acuna

Improving semantic grounding in Vision-Language Models (VLMs) often involves collecting domain-specific training data, refining the network architectures, or modifying the training recipes.

Diffusion Renderer: Neural Inverse and Forward Rendering with Video Diffusion Models

no code implementations CVPR 2025 Ruofan Liang, Zan Gojcic, Huan Ling, Jacob Munkberg, Jon Hasselgren, Chih-Hao Lin, Jun Gao, Alexander Keller, Nandita Vijaykumar, Sanja Fidler, Zian Wang

Classic physically-based rendering (PBR) accurately simulates the light transport, but relies on precise scene representations--explicit 3D geometry, high-quality material properties, and lighting conditions--that are often impractical to obtain in real-world scenarios.

3D geometry Inverse Rendering

LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models

no code implementations14 Nov 2024 Zhengyi Wang, Jonathan Lorraine, Yikai Wang, Hang Su, Jun Zhu, Sanja Fidler, Xiaohui Zeng

This work explores expanding the capabilities of large language models (LLMs) pretrained on text to generate 3D meshes within a unified model.

3D Generation Text Generation

ReMatching Dynamic Reconstruction Flow

no code implementations1 Nov 2024 Sara Oblak, Despoina Paschalidou, Sanja Fidler, Matan Atzmon

Reconstructing a dynamic scene from image inputs is a fundamental computer vision task with many downstream applications.

Dynamic Reconstruction

SCube: Instant Large-Scale Scene Reconstruction using VoxSplats

no code implementations26 Oct 2024 Xuanchi Ren, Yifan Lu, Hanxue Liang, Zhangjie Wu, Huan Ling, Mike Chen, Sanja Fidler, Francis Williams, Jiahui Huang

We present SCube, a novel method for reconstructing large-scale 3D scenes (geometry, appearance, and semantics) from a sparse set of posed images.

3D Reconstruction Scene Generation

SpaceMesh: A Continuous Representation for Learning Manifold Surface Meshes

no code implementations30 Sep 2024 Tianchang Shen, Zhaoshuo Li, Marc Law, Matan Atzmon, Sanja Fidler, James Lucas, Jun Gao, Nicholas Sharp

In particular, our vertex embeddings generate cyclic neighbor relationships in a halfedge mesh representation, which gives a guarantee of edge-manifoldness and the ability to represent general polygonal meshes.

Stochastic Optimization

Reasoning Paths with Reference Objects Elicit Quantitative Spatial Reasoning in Large Vision-Language Models

no code implementations15 Sep 2024 Yuan-Hong Liao, Rafid Mahmood, Sanja Fidler, David Acuna

Inspired by this observation, we develop a zero-shot prompting technique, SpatialPrompt, that encourages VLMs to answer quantitative spatial questions using reference objects as visual cues.

Spatial Reasoning

OmniRe: Omni Urban Scene Reconstruction

1 code implementation29 Aug 2024 Ziyu Chen, Jiawei Yang, Jiahui Huang, Riccardo de Lutio, Janick Martinez Esturo, Boris Ivanovic, Or Litany, Zan Gojcic, Sanja Fidler, Marco Pavone, Li Song, Yue Wang

We introduce OmniRe, a comprehensive system for efficiently creating high-fidelity digital twins of dynamic real-world scenes from on-device logs.

3DGS

Photorealistic Object Insertion with Diffusion-Guided Inverse Rendering

no code implementations19 Aug 2024 Ruofan Liang, Zan Gojcic, Merlin Nimier-David, David Acuna, Nandita Vijaykumar, Sanja Fidler, Zian Wang

The correct insertion of virtual objects in images of real-world scenes requires a deep understanding of the scene's lighting, geometry and materials, as well as the image formation process.

Inverse Rendering Object +1

SuperPADL: Scaling Language-Directed Physics-Based Control with Progressive Supervised Distillation

no code implementations15 Jul 2024 Jordan Juravsky, Yunrong Guo, Sanja Fidler, Xue Bin Peng

Inspired by these successes, in this work we introduce SuperPADL, a scalable framework for physics-based text-to-motion that leverages both RL and supervised learning to train controllers on thousands of diverse motion clips.

Reinforcement Learning (RL)

3D Gaussian Ray Tracing: Fast Tracing of Particle Scenes

no code implementations9 Jul 2024 Nicolas Moenne-Loccoz, Ashkan Mirzaei, Or Perel, Riccardo de Lutio, Janick Martinez Esturo, Gavriel State, Sanja Fidler, Nicholas Sharp, Zan Gojcic

The benefits of ray tracing are well-known in computer graphics: processing incoherent rays for secondary lighting effects such as shadows and reflections, rendering from highly-distorted cameras common in robotics, stochastically sampling rays, and more.

DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features

no code implementations17 Jun 2024 Letian Wang, Seung Wook Kim, Jiawei Yang, Cunjun Yu, Boris Ivanovic, Steven L. Waslander, Yue Wang, Sanja Fidler, Marco Pavone, Peter Karkus

Our method is a generalizable feedforward model that predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs with limited view overlap, and is trained self-supervised with differentiable rendering to reconstruct RGB, depth, or feature images.

3D geometry 3D Semantic Occupancy Prediction +5

L4GM: Large 4D Gaussian Reconstruction Model

no code implementations14 Jun 2024 Jiawei Ren, Kevin Xie, Ashkan Mirzaei, Hanxue Liang, Xiaohui Zeng, Karsten Kreis, Ziwei Liu, Antonio Torralba, Sanja Fidler, Seung Wook Kim, Huan Ling

We present L4GM, the first 4D Large Reconstruction Model that produces animated objects from a single-view video input -- in a single feed-forward pass that takes only a second.

model

NeRF-XL: Scaling NeRFs with Multiple GPUs

no code implementations24 Apr 2024 RuiLong Li, Sanja Fidler, Angjoo Kanazawa, Francis Williams

We present NeRF-XL, a principled method for distributing Neural Radiance Fields (NeRFs) across multiple GPUs, thus enabling the training and rendering of NeRFs with an arbitrarily large capacity.

NeRF

Align Your Steps: Optimizing Sampling Schedules in Diffusion Models

no code implementations22 Apr 2024 Amirmojtaba Sabour, Sanja Fidler, Karsten Kreis

Diffusion models (DMs) have established themselves as the state-of-the-art generative modeling approach in the visual domain and beyond.

RefFusion: Reference Adapted Diffusion Models for 3D Scene Inpainting

no code implementations16 Apr 2024 Ashkan Mirzaei, Riccardo de Lutio, Seung Wook Kim, David Acuna, Jonathan Kelly, Sanja Fidler, Igor Gilitschenski, Zan Gojcic

In this work, we propose an approach for 3D scene inpainting -- the task of coherently replacing parts of the reconstructed scene with desired content.

3D Inpainting Image Inpainting

Can Feedback Enhance Semantic Grounding in Large Vision-Language Models?

no code implementations9 Apr 2024 Yuan-Hong Liao, Rafid Mahmood, Sanja Fidler, David Acuna

We find that if prompted appropriately, VLMs can utilize feedback both in a single step and iteratively, showcasing the potential of feedback as an alternative technique to improve grounding in internet-scale VLMs.

Augmented Reality based Simulated Data (ARSim) with multi-view consistency for AV perception networks

no code implementations22 Mar 2024 Aqeel Anwar, Tae Eun Choe, Zian Wang, Sanja Fidler, Minwoo Park

The resulting augmented multi-view consistent dataset is used to train a multi-camera perception network for autonomous vehicles.

Autonomous Driving Domain Adaptation

LATTE3D: Large-scale Amortized Text-To-Enhanced3D Synthesis

no code implementations22 Mar 2024 Kevin Xie, Jonathan Lorraine, Tianshi Cao, Jun Gao, James Lucas, Antonio Torralba, Sanja Fidler, Xiaohui Zeng

Recent text-to-3D generation approaches produce impressive 3D results but require time-consuming optimization that can take up to an hour per prompt.

3D Generation Text to 3D

EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models

no code implementations22 Jan 2024 Koichi Namekata, Amirmojtaba Sabour, Sanja Fidler, Seung Wook Kim

Diffusion models have recently received increasing research attention for their remarkable transfer abilities in semantic segmentation tasks.

Segmentation Semantic Segmentation

Compact Neural Graphics Primitives with Learned Hash Probing

no code implementations28 Dec 2023 Towaki Takikawa, Thomas Müller, Merlin Nimier-David, Alex Evans, Sanja Fidler, Alec Jacobson, Alexander Keller

Neural graphics primitives are faster and achieve higher quality when their neural networks are augmented by spatial data structures that hold trainable features arranged in a grid.

Quantization

Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models

no code implementations CVPR 2024 Huan Ling, Seung Wook Kim, Antonio Torralba, Sanja Fidler, Karsten Kreis

We also propose a motion amplification mechanism as well as a new autoregressive synthesis scheme to generate and combine multiple 4D sequences for longer generation.

Synthetic Data Generation Video Generation

Trajeglish: Traffic Modeling as Next-Token Prediction

2 code implementations7 Dec 2023 Jonah Philion, Xue Bin Peng, Sanja Fidler

A longstanding challenge for self-driving development is simulating dynamic driving scenarios seeded from recorded driving logs.

Decoder Prediction

XCube: Large-Scale 3D Generative Modeling using Sparse Voxel Hierarchies

1 code implementation CVPR 2024 Xuanchi Ren, Jiahui Huang, Xiaohui Zeng, Ken Museth, Sanja Fidler, Francis Williams

We present XCube (abbreviated as $\mathcal{X}^3$), a novel generative model for high-resolution sparse 3D voxel grids with arbitrary attributes.

3D Shape Generation Scene Generation +1

WildFusion: Learning 3D-Aware Latent Diffusion Models in View Space

no code implementations22 Nov 2023 Katja Schwarz, Seung Wook Kim, Jun Gao, Sanja Fidler, Andreas Geiger, Karsten Kreis

Then, we train a diffusion model in the 3D-aware latent space, thereby enabling synthesis of high-quality 3D-consistent image samples, outperforming recent state-of-the-art GAN-based methods.

3D-Aware Image Synthesis 3D geometry +3

Adaptive Shells for Efficient Neural Radiance Field Rendering

no code implementations16 Nov 2023 Zian Wang, Tianchang Shen, Merlin Nimier-David, Nicholas Sharp, Jun Gao, Alexander Keller, Sanja Fidler, Thomas Müller, Zan Gojcic

We then extract an explicit mesh of a narrow band around the surface, with width determined by the kernel size, and fine-tune the radiance field within this band.

Novel View Synthesis Stochastic Optimization

3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features

no code implementations CVPR 2024 Chenfeng Xu, Huan Ling, Sanja Fidler, Or Litany

However, these features are initially trained on paired text and image data, which are not optimized for 3D tasks, and often exhibit a domain gap when applied to the target data.

3D Object Detection Novel View Synthesis +2

ViR: Towards Efficient Vision Retention Backbones

1 code implementation30 Oct 2023 Ali Hatamizadeh, Michael Ranzinger, Shiyi Lan, Jose M. Alvarez, Sanja Fidler, Jan Kautz

Inspired by this trend, we propose a new class of computer vision models, dubbed Vision Retention Networks (ViR), with dual parallel and recurrent formulations, which strike an optimal balance between fast inference and parallel training with competitive performance.

TexFusion: Synthesizing 3D Textures with Text-Guided Image Diffusion Models

no code implementations ICCV 2023 Tianshi Cao, Karsten Kreis, Sanja Fidler, Nicholas Sharp, Kangxue Yin

We present TexFusion (Texture Diffusion), a new method to synthesize textures for given 3D geometries, using large-scale text-guided image diffusion models.

Denoising Game Design +1

Towards Viewpoint Robustness in Bird's Eye View Segmentation

no code implementations ICCV 2023 Tzofi Klinghoffer, Jonah Philion, Wenzheng Chen, Or Litany, Zan Gojcic, Jungseock Joo, Ramesh Raskar, Sanja Fidler, Jose M. Alvarez

We introduce a technique for novel view synthesis and use it to transform collected data to the viewpoint of target rigs, allowing us to train BEV segmentation models for diverse target rigs without any additional data collection or labeling cost.

Autonomous Vehicles BEV Segmentation +1

Flexible Isosurface Extraction for Gradient-Based Mesh Optimization

1 code implementation10 Aug 2023 Tianchang Shen, Jacob Munkberg, Jon Hasselgren, Kangxue Yin, Zian Wang, Wenzheng Chen, Zan Gojcic, Sanja Fidler, Nicholas Sharp, Jun Gao

This work considers gradient-based mesh optimization, where we iteratively optimize for a 3D surface mesh by representing it as the isosurface of a scalar field, an increasingly common paradigm in applications including photogrammetry, generative modeling, and inverse physics.

DreamTeacher: Pretraining Image Backbones with Deep Generative Models

no code implementations ICCV 2023 Daiqing Li, Huan Ling, Amlan Kar, David Acuna, Seung Wook Kim, Karsten Kreis, Antonio Torralba, Sanja Fidler

In this work, we introduce a self-supervised feature representation learning framework DreamTeacher that utilizes generative networks for pre-training downstream image backbones.

Knowledge Distillation Representation Learning

ATT3D: Amortized Text-to-3D Object Synthesis

no code implementations ICCV 2023 Jonathan Lorraine, Kevin Xie, Xiaohui Zeng, Chen-Hsuan Lin, Towaki Takikawa, Nicholas Sharp, Tsung-Yi Lin, Ming-Yu Liu, Sanja Fidler, James Lucas

Text-to-3D modelling has seen exciting progress by combining generative text-to-image models with image-to-3D methods like Neural Radiance Fields.

Image to 3D Object +1

Neural Kernel Surface Reconstruction

1 code implementation CVPR 2023 Jiahui Huang, Zan Gojcic, Matan Atzmon, Or Litany, Sanja Fidler, Francis Williams

We present a novel method for reconstructing a 3D implicit surface from a large-scale, sparse, and noisy point cloud.

Surface Reconstruction

Neural LiDAR Fields for Novel View Synthesis

no code implementations ICCV 2023 Shengyu Huang, Zan Gojcic, Zian Wang, Francis Williams, Yoni Kasten, Sanja Fidler, Konrad Schindler, Or Litany

We present Neural Fields for LiDAR (NFL), a method to optimise a neural field scene representation from LiDAR measurements, with the goal of synthesizing realistic LiDAR scans from novel viewpoints.

NeRF Novel LiDAR View Synthesis +1

NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models

no code implementations CVPR 2023 Seung Wook Kim, Bradley Brown, Kangxue Yin, Karsten Kreis, Katja Schwarz, Daiqing Li, Robin Rombach, Antonio Torralba, Sanja Fidler

We first train a scene auto-encoder to express a set of image and pose pairs as a neural field, represented as density and feature voxel grids that can be projected to produce novel views of the scene.

Scene Generation

Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models

4 code implementations CVPR 2023 Andreas Blattmann, Robin Rombach, Huan Ling, Tim Dockhorn, Seung Wook Kim, Sanja Fidler, Karsten Kreis

We first pre-train an LDM on images only; then, we turn the image generator into a video generator by introducing a temporal dimension to the latent space diffusion model and fine-tuning on encoded image sequences, i. e., videos.

Ranked #5 on Text-to-Video Generation on MSR-VTT (CLIP-FID metric)

Image Generation Text-to-Video Generation +3

Neural Fields meet Explicit Geometric Representation for Inverse Rendering of Urban Scenes

no code implementations6 Apr 2023 Zian Wang, Tianchang Shen, Jun Gao, Shengyu Huang, Jacob Munkberg, Jon Hasselgren, Zan Gojcic, Wenzheng Chen, Sanja Fidler

Reconstruction and intrinsic decomposition of scenes from captured imagery would enable many applications such as relighting and virtual object insertion.

3D Reconstruction Inverse Rendering +1

Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion

no code implementations CVPR 2023 Davis Rempe, Zhengyi Luo, Xue Bin Peng, Ye Yuan, Kris Kitani, Karsten Kreis, Sanja Fidler, Or Litany

We introduce a method for generating realistic pedestrian trajectories and full-body animations that can be controlled to meet user-defined goals.

Collision Avoidance

Bridging the Sim2Real gap with CARE: Supervised Detection Adaptation with Conditional Alignment and Reweighting

no code implementations9 Feb 2023 Viraj Prabhu, David Acuna, Andrew Liao, Rafid Mahmood, Marc T. Law, Judy Hoffman, Sanja Fidler, James Lucas

Sim2Real domain adaptation (DA) research focuses on the constrained setting of adapting from a labeled synthetic source domain to an unlabeled or sparsely labeled real target domain.

Autonomous Driving Domain Adaptation +3

Synthesizing Physical Character-Scene Interactions

no code implementations2 Feb 2023 Mohamed Hassan, Yunrong Guo, Tingwu Wang, Michael Black, Sanja Fidler, Xue Bin Peng

These scene interactions are learned using an adversarial discriminator that evaluates the realism of a motion within the context of a scene.

Imitation Learning Motion Generation

PADL: Language-Directed Physics-Based Character Control

1 code implementation31 Jan 2023 Jordan Juravsky, Yunrong Guo, Sanja Fidler, Xue Bin Peng

In this work, we present PADL, which leverages recent innovations in NLP in order to take steps towards developing language-directed controllers for physics-based character animation.

Image Generation Imitation Learning +3

Learning Human Dynamics in Autonomous Driving Scenarios

no code implementations ICCV 2023 Jingbo Wang, Ye Yuan, Zhengyi Luo, Kevin Xie, Dahua Lin, Umar Iqbal, Sanja Fidler, Sameh Khamis

In this work, we propose a holistic framework for learning physically plausible human dynamics from real driving scenarios, narrowing the gap between real and simulated human behavior in safety-critical applications.

Autonomous Driving Human Dynamics

Magic3D: High-Resolution Text-to-3D Content Creation

1 code implementation CVPR 2023 Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, Tsung-Yi Lin

DreamFusion has recently demonstrated the utility of a pre-trained text-to-image diffusion model to optimize Neural Radiance Fields (NeRF), achieving remarkable text-to-3D synthesis results.

NeRF Text to 3D +1

LION: Latent Point Diffusion Models for 3D Shape Generation

2 code implementations12 Oct 2022 Xiaohui Zeng, Arash Vahdat, Francis Williams, Zan Gojcic, Or Litany, Sanja Fidler, Karsten Kreis

To advance 3D DDMs and make them useful for digital artists, we require (i) high generation quality, (ii) flexibility for manipulation and applications such as conditional synthesis and shape interpolation, and (iii) the ability to output smooth surfaces or meshes.

3D Generation 3D Shape Generation +3

XDGAN: Multi-Modal 3D Shape Generation in 2D Space

no code implementations6 Oct 2022 Hassan Abu Alhaija, Alara Dirik, André Knörig, Sanja Fidler, Maria Shugrina

Specifically, we propose a novel method to convert 3D shapes into compact 1-channel geometry images and leverage StyleGAN3 and image-to-image translation networks to generate 3D objects in 2D space.

3D Shape Generation Image-to-Image Translation

Optimizing Data Collection for Machine Learning

no code implementations3 Oct 2022 Rafid Mahmood, James Lucas, Jose M. Alvarez, Sanja Fidler, Marc T. Law

Modern deep learning systems require huge data sets to achieve impressive performance, but there is little guidance on how much or what kind of data to collect.

EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations

3 code implementations26 Sep 2022 Ahmad Darkhalil, Dandan Shan, Bin Zhu, Jian Ma, Amlan Kar, Richard Higgins, Sanja Fidler, David Fouhey, Dima Damen

VISOR annotates videos from EPIC-KITCHENS, which comes with a new set of challenges not encountered in current video segmentation datasets.

Object Segmentation +4

GET3D: A Generative Model of High Quality 3D Textured Shapes Learned from Images

3 code implementations22 Sep 2022 Jun Gao, Tianchang Shen, Zian Wang, Wenzheng Chen, Kangxue Yin, Daiqing Li, Or Litany, Zan Gojcic, Sanja Fidler

As several industries are moving towards modeling massive 3D virtual worlds, the need for content creation tools that can scale in terms of the quantity, quality, and diversity of 3D content is becoming evident.

Diversity

Neural Light Field Estimation for Street Scenes with Differentiable Virtual Object Insertion

no code implementations19 Aug 2022 Zian Wang, Wenzheng Chen, David Acuna, Jan Kautz, Sanja Fidler

In this work, we propose a neural approach that estimates the 5D HDR light field from a single image, and a differentiable object insertion formulation that enables end-to-end training with image-based losses that encourage realism.

Autonomous Driving Lighting Estimation +1

MvDeCor: Multi-view Dense Correspondence Learning for Fine-grained 3D Segmentation

3 code implementations18 Aug 2022 Gopal Sharma, Kangxue Yin, Subhransu Maji, Evangelos Kalogerakis, Or Litany, Sanja Fidler

As a result, the learned 2D representations are view-invariant and geometrically consistent, leading to better generalization when trained on a limited number of labeled shapes compared to alternatives that utilize self-supervision in 2D or 3D alone.

Contrastive Learning Segmentation

Improving Semantic Segmentation in Transformers using Hierarchical Inter-Level Attention

no code implementations5 Jul 2022 Gary Leung, Jun Gao, Xiaohui Zeng, Sanja Fidler

HILA extends hierarchical vision transformer architectures by adding local connections between features of higher and lower levels to the backbone encoder.

Object Semantic Segmentation

Variable Bitrate Neural Fields

1 code implementation15 Jun 2022 Towaki Takikawa, Alex Evans, Jonathan Tremblay, Thomas Müller, Morgan McGuire, Alec Jacobson, Sanja Fidler

Neural approximations of scalar and vector fields, such as signed distance functions and radiance fields, have emerged as accurate, high-quality representations.

Decoder

ASE: Large-Scale Reusable Adversarial Skill Embeddings for Physically Simulated Characters

1 code implementation4 May 2022 Xue Bin Peng, Yunrong Guo, Lina Halper, Sergey Levine, Sanja Fidler

By leveraging a massively parallel GPU-based simulator, we are able to train skill embeddings using over a decade of simulated experiences, enabling our model to learn a rich and versatile repertoire of skills.

Imitation Learning Unsupervised Reinforcement Learning

M$^2$BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified Birds-Eye View Representation

no code implementations11 Apr 2022 Enze Xie, Zhiding Yu, Daquan Zhou, Jonah Philion, Anima Anandkumar, Sanja Fidler, Ping Luo, Jose M. Alvarez

In this paper, we propose M$^2$BEV, a unified framework that jointly performs 3D object detection and map segmentation in the Birds Eye View~(BEV) space with multi-camera image inputs.

3D Object Detection BEV Segmentation +2

AUV-Net: Learning Aligned UV Maps for Texture Transfer and Synthesis

no code implementations CVPR 2022 Zhiqin Chen, Kangxue Yin, Sanja Fidler

In this paper, we address the problem of texture representation for 3D shapes for the challenging and underexplored tasks of texture transfer and synthesis.

3D Reconstruction Single-View 3D Reconstruction +1

Learning Smooth Neural Functions via Lipschitz Regularization

no code implementations16 Feb 2022 Hsueh-Ti Derek Liu, Francis Williams, Alec Jacobson, Sanja Fidler, Or Litany

The latent descriptor of a neural field acts as a deformation handle for the 3D shape it represents.

Domain Adversarial Training: A Game Perspective

no code implementations ICLR 2022 David Acuna, Marc T Law, Guojun Zhang, Sanja Fidler

Defining optimal solutions in domain-adversarial training as a local Nash equilibrium, we show that gradient descent in domain-adversarial training can violate the asymptotic convergence guarantees of the optimizer, oftentimes hindering the transfer performance.

Domain Adaptation

Causal Scene BERT: Improving object detection by searching for challenging groups of data

no code implementations8 Feb 2022 Cinjon Resnick, Or Litany, Amlan Kar, Karsten Kreis, James Lucas, Kyunghyun Cho, Sanja Fidler

Our main contribution is a pseudo-automatic method to discover such groups in foresight by performing causal interventions on simulated scenes.

Autonomous Vehicles object-detection +1

Federated Learning with Heterogeneous Architectures using Graph HyperNetworks

no code implementations20 Jan 2022 Or Litany, Haggai Maron, David Acuna, Jan Kautz, Gal Chechik, Sanja Fidler

Standard Federated Learning (FL) techniques are limited to clients with identical network architectures.

Federated Learning

Generating Useful Accident-Prone Driving Scenarios via a Learned Traffic Prior

no code implementations CVPR 2022 Davis Rempe, Jonah Philion, Leonidas J. Guibas, Sanja Fidler, Or Litany

Scenario generation is formulated as an optimization in the latent space of this traffic model, perturbing an initial real-world scene to produce trajectories that collide with a given planner.

Autonomous Vehicles

Frame Averaging for Equivariant Shape Space Learning

no code implementations CVPR 2022 Matan Atzmon, Koki Nagano, Sanja Fidler, Sameh Khamis, Yaron Lipman

A natural way to incorporate symmetries in shape space learning is to ask that the mapping to the shape space (encoder) and mapping from the shape space (decoder) are equivariant to the relevant symmetries.

Don’t Generate Me: Training Differentially Private Generative Models with Sinkhorn Divergence

no code implementations NeurIPS 2021 Tianshi Cao, Alex Bie, Arash Vahdat, Sanja Fidler, Karsten Kreis

Generative models trained with privacy constraints on private data can sidestep this challenge, providing indirect access to private data instead.

Towards Optimal Strategies for Training Self-Driving Perception Models in Simulation

no code implementations NeurIPS 2021 David Acuna, Jonah Philion, Sanja Fidler

Alternative solutions seek to exploit driving simulators that can generate large amounts of labeled data with a plethora of content variations.

Autonomous Driving Domain Adaptation

Deep Marching Tetrahedra: a Hybrid Representation for High-Resolution 3D Shape Synthesis

no code implementations NeurIPS 2021 Tianchang Shen, Jun Gao, Kangxue Yin, Ming-Yu Liu, Sanja Fidler

The core of DMTet includes a deformable tetrahedral grid that encodes a discretized signed distance function and a differentiable marching tetrahedra layer that converts the implicit signed distance representation to the explicit surface mesh representation.

EditGAN: High-Precision Semantic Image Editing

1 code implementation NeurIPS 2021 Huan Ling, Karsten Kreis, Daiqing Li, Seung Wook Kim, Antonio Torralba, Sanja Fidler

EditGAN builds on a GAN framework that jointly models images and their semantic segmentations, requiring only a handful of labeled examples, making it a scalable tool for editing.

Segmentation Semantic Segmentation +1

Don't Generate Me: Training Differentially Private Generative Models with Sinkhorn Divergence

1 code implementation1 Nov 2021 Tianshi Cao, Alex Bie, Arash Vahdat, Sanja Fidler, Karsten Kreis

Generative models trained with privacy constraints on private data can sidestep this challenge, providing indirect access to private data instead.

ATISS: Autoregressive Transformers for Indoor Scene Synthesis

1 code implementation NeurIPS 2021 Despoina Paschalidou, Amlan Kar, Maria Shugrina, Karsten Kreis, Andreas Geiger, Sanja Fidler

The ability to synthesize realistic and diverse indoor furniture layouts automatically or based on partial input, unlocks many applications, from better interactive 3D tools to data synthesis for training and simulation.

2D Semantic Segmentation task 1 (8 classes) 3D Semantic Scene Completion +1

Low-Budget Active Learning via Wasserstein Distance: An Integer Programming Approach

no code implementations ICLR 2022 Rafid Mahmood, Sanja Fidler, Marc T Law

Active learning is the process of training a model with limited labeled data by selecting a core subset of an unlabeled data pool to label.

Active Learning

Causal Scene BERT: Improving object detection by searching for challenging groups

no code implementations29 Sep 2021 Cinjon Resnick, Or Litany, Amlan Kar, Karsten Kreis, James Lucas, Kyunghyun Cho, Sanja Fidler

We verify that the prioritized groups found via intervention are challenging for the object detector and show that retraining with data collected from these groups helps inordinately compared to adding more IID data.

Autonomous Vehicles object-detection +1

Physics-based Human Motion Estimation and Synthesis from Videos

no code implementations ICCV 2021 Kevin Xie, Tingwu Wang, Umar Iqbal, Yunrong Guo, Sanja Fidler, Florian Shkurti

By enabling learning of motion synthesis from video, our method paves the way for large-scale, realistic and diverse motion synthesis.

Motion Estimation Motion Synthesis +1

3DStyleNet: Creating 3D Shapes with Geometric and Texture Style Variations

no code implementations ICCV 2021 Kangxue Yin, Jun Gao, Maria Shugrina, Sameh Khamis, Sanja Fidler

Given a small set of high-quality textured objects, our method can create many novel stylized shapes, resulting in effortless 3D content creation and style-ware data augmentation.

3D Reconstruction Data Augmentation +1

NP-DRAW: A Non-Parametric Structured Latent Variable Model for Image Generation

1 code implementation25 Jun 2021 Xiaohui Zeng, Raquel Urtasun, Richard Zemel, Sanja Fidler, Renjie Liao

1) We propose a non-parametric prior distribution over the appearance of image parts so that the latent variable ``what-to-draw'' per step becomes a categorical random variable.

Image Generation

f-Domain-Adversarial Learning: Theory and Algorithms

1 code implementation21 Jun 2021 David Acuna, Guojun Zhang, Marc T. Law, Sanja Fidler

Unsupervised domain adaptation is used in many machine learning applications where, during training, a model has access to unlabeled data in the target domain, and a related labeled dataset.

Learning Theory Unsupervised Domain Adaptation

Low Budget Active Learning via Wasserstein Distance: An Integer Programming Approach

no code implementations5 Jun 2021 Rafid Mahmood, Sanja Fidler, Marc T. Law

Active learning is the process of training a model with limited labeled data by selecting a core subset of an unlabeled data pool to label.

Active Learning

DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort

2 code implementations CVPR 2021 Yuxuan Zhang, Huan Ling, Jun Gao, Kangxue Yin, Jean-Francois Lafleche, Adela Barriuso, Antonio Torralba, Sanja Fidler

To showcase the power of our approach, we generated datasets for 7 image segmentation tasks which include pixel-level labels for 34 human face parts, and 32 car parts.

Decoder Image Segmentation +1

Image-Level or Object-Level? A Tale of Two Resampling Strategies for Long-Tailed Detection

1 code implementation12 Apr 2021 Nadine Chang, Zhiding Yu, Yu-Xiong Wang, Anima Anandkumar, Sanja Fidler, Jose M. Alvarez

As a result, image resampling alone is not enough to yield a sufficiently balanced distribution at the object level.

Object

Neural Parts: Learning Expressive 3D Shape Abstractions with Invertible Neural Networks

1 code implementation CVPR 2021 Despoina Paschalidou, Angelos Katharopoulos, Andreas Geiger, Sanja Fidler

The INN allows us to compute the inverse mapping of the homeomorphism, which in turn, enables the efficient computation of both the implicit surface function of a primitive and its mesh, without any additional post-processing.

Neural Geometric Level of Detail: Real-time Rendering with Implicit 3D Shapes

2 code implementations CVPR 2021 Towaki Takikawa, Joey Litalien, Kangxue Yin, Karsten Kreis, Charles Loop, Derek Nowrouzezahrai, Alec Jacobson, Morgan McGuire, Sanja Fidler

We introduce an efficient neural representation that, for the first time, enables real-time rendering of high-fidelity neural SDFs, while achieving state-of-the-art geometry reconstruction quality.

f-Domain-Adversarial Learning: Theory and Algorithms for Unsupervised Domain Adaptation with Neural Networks

no code implementations1 Jan 2021 David Acuna, Guojun Zhang, Marc T Law, Sanja Fidler

We provide empirical results for several f-divergences and show that some, not considered previously in domain-adversarial learning, achieve state-of-the-art results in practice.

Generalization Bounds Learning Theory +1

Differentially Private Generative Models Through Optimal Transport

no code implementations1 Jan 2021 Tianshi Cao, Alex Bie, Karsten Kreis, Sanja Fidler

Generative models trained with privacy constraints on private data can sidestep this challenge and provide indirect access to the private data instead.

Personalized Federated Learning with First Order Model Optimization

3 code implementations ICLR 2021 Michael Zhang, Karan Sapra, Sanja Fidler, Serena Yeung, Jose M. Alvarez

While federated learning traditionally aims to train a single global model across decentralized local datasets, one model may not always be ideal for all participating clients.

model Model Optimization +1

Variational Amodal Object Completion

no code implementations NeurIPS 2020 Huan Ling, David Acuna, Karsten Kreis, Seung Wook Kim, Sanja Fidler

In images of complex scenes, objects are often occluding each other which makes perception tasks such as object detection and tracking, or robotic control tasks such as planning, challenging.

Object object-detection +1

UniCon: Universal Neural Controller For Physics-based Character Motion

no code implementations30 Nov 2020 Tingwu Wang, Yunrong Guo, Maria Shugrina, Sanja Fidler

The field of physics-based animation is gaining importance due to the increasing demand for realism in video games and films, and has recently seen wide adoption of data-driven techniques, such as deep reinforcement learning (RL), which learn control from (human) demonstrations.

Deep Reinforcement Learning Reinforcement Learning (RL)

Emergent Road Rules In Multi-Agent Driving Environments

1 code implementation ICLR 2021 Avik Pal, Jonah Philion, Yuan-Hong Liao, Sanja Fidler

For autonomous vehicles to safely share the road with human drivers, autonomous vehicles must abide by specific "road rules" that human drivers have agreed to follow.

Autonomous Vehicles

Learning Deformable Tetrahedral Meshes for 3D Reconstruction

1 code implementation NeurIPS 2020 Jun Gao, Wenzheng Chen, Tommy Xiang, Clement Fuji Tsang, Alec Jacobson, Morgan McGuire, Sanja Fidler

We introduce Deformable Tetrahedral Meshes (DefTet) as a particular parameterization that utilizes volumetric tetrahedral meshes for the reconstruction problem.

3D Reconstruction

Image GANs meet Differentiable Rendering for Inverse Graphics and Interpretable 3D Neural Rendering

no code implementations ICLR 2021 Yuxuan Zhang, Wenzheng Chen, Huan Ling, Jun Gao, Yinan Zhang, Antonio Torralba, Sanja Fidler

Key to our approach is to exploit GANs as a multi-view data generator to train an inverse graphics network using an off-the-shelf differentiable renderer, and the trained inverse graphics network as a teacher to disentangle the GAN's latent code into interpretable 3D properties.

3D geometry Neural Rendering

Fed-Sim: Federated Simulation for Medical Imaging

no code implementations1 Sep 2020 Daiqing Li, Amlan Kar, Nishant Ravikumar, Alejandro F. Frangi, Sanja Fidler

Since the model of geometry and material is disentangled from the imaging sensor, it can effectively be trained across multiple medical centers.

Federated Learning

Expressive Telepresence via Modular Codec Avatars

no code implementations ECCV 2020 Hang Chu, Shugao Ma, Fernando de la Torre, Sanja Fidler, Yaser Sheikh

It is important to note that traditional person-specific CAs are learned from few training samples, and typically lack robustness as well as limited expressiveness when transferring facial expressions.

Interactive Annotation of 3D Object Geometry using 2D Scribbles

no code implementations ECCV 2020 Tianchang Shen, Jun Gao, Amlan Kar, Sanja Fidler

We implement our framework as a web service and conduct a user study, where we show that user annotated data using our method effectively facilitates real-world learning tasks.

3D geometry

ScribbleBox: Interactive Annotation Framework for Video Object Segmentation

no code implementations ECCV 2020 Bo-Wen Chen, Huan Ling, Xiaohui Zeng, Gao Jun, Ziyue Xu, Sanja Fidler

Our approach tolerates a modest amount of noise in the box placements, thus typically only a few clicks are needed to annotate tracked boxes to a sufficient accuracy.

Object Segmentation +3

Beyond Fixed Grid: Learning Geometric Image Representation with a Deformable Grid

no code implementations ECCV 2020 Jun Gao, Zian Wang, Jinchen Xuan, Sanja Fidler

We also utilize DefGrid at the output layers for the task of object mask annotation, and show that reasoning about object boundaries on our predicted polygonal grid leads to more accurate results over existing pixel-wise and curve-based approaches.

Semantic Segmentation

Meta-Sim2: Unsupervised Learning of Scene Structure for Synthetic Data Generation

no code implementations ECCV 2020 Jeevan Devaranjan, Amlan Kar, Sanja Fidler

In Meta-Sim2, we aim to learn the scene structure in addition to parameters, which is a challenging problem due to its discrete nature.

Synthetic Data Generation valid

Learning to Generate Diverse Dance Motions with Transformer

no code implementations18 Aug 2020 Jiaman Li, Yihang Yin, Hang Chu, Yi Zhou, Tingwu Wang, Sanja Fidler, Hao Li

We also introduce new evaluation metrics for the quality of synthesized dance motions, and demonstrate that our system can outperform state-of-the-art methods.

Motion Synthesis

Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by Implicitly Unprojecting to 3D

1 code implementation ECCV 2020 Jonah Philion, Sanja Fidler

By training on the entire camera rig, we provide evidence that our model is able to learn not only how to represent images but how to fuse predictions from all cameras into a single cohesive representation of the scene while being robust to calibration error.

Ranked #6 on Bird's-Eye View Semantic Segmentation on OBAT PENGGUGUR KANDUNGAN DI BANJARMASIN (087776558899) (IoU ped - 224x480 - Vis filter. - 100x100 at 0.5 metric)

Autonomous Vehicles Bird's-Eye View Semantic Segmentation +1

Efficient and Information-Preserving Future Frame Prediction and Beyond

1 code implementation ICLR 2020 Wei Yu, Yichao Lu, Steve Easterbrook, Sanja Fidler

Applying resolution-preserving blocks is a common practice to maximize information preservation in video prediction, yet their high memory consumption greatly limits their application scenarios.

Computational Efficiency object-detection +4

The EPIC-KITCHENS Dataset: Collection, Challenges and Baselines

2 code implementations29 Apr 2020 Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Sanja Fidler, Antonino Furnari, Evangelos Kazakos, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, Michael Wray

Our dataset features 55 hours of video consisting of 11. 5M frames, which we densely labelled for a total of 39. 6K action segments and 454. 2K object bounding boxes.

Object

Learning to Evaluate Perception Models Using Planner-Centric Metrics

no code implementations CVPR 2020 Jonah Philion, Amlan Kar, Sanja Fidler

The downside of these metrics is that, at worst, they penalize all incorrect detections equally without conditioning on the task or scene, and at best, heuristics need to be chosen to ensure that different mistakes count differently.

3D Object Detection object-detection

Neural Data Server: A Large-Scale Search Engine for Transfer Learning Data

no code implementations CVPR 2020 Xi Yan, David Acuna, Sanja Fidler

NDS consists of a dataserver which indexes several large popular image datasets, and aims to recommend data to a client, an end-user with a target application with its own small labeled dataset.

image-classification Image Classification +6

The Shmoop Corpus: A Dataset of Stories with Loosely Aligned Summaries

1 code implementation30 Dec 2019 Atef Chaudhury, Makarand Tapaswi, Seung Wook Kim, Sanja Fidler

Understanding stories is a challenging reading comprehension problem for machines as it requires reading a large volume of text and following long-range dependencies.

Abstractive Text Summarization Form +2

CrevNet: Conditionally Reversible Video Prediction

no code implementations25 Oct 2019 Wei Yu, Yichao Lu, Steve Easterbrook, Sanja Fidler

Applying resolution-preserving blocks is a common practice to maximize information preservation in video prediction, yet their high memory consumption greatly limits their application scenarios.

Computational Efficiency Prediction +1

Neural Turtle Graphics for Modeling City Road Layouts

no code implementations ICCV 2019 Hang Chu, Daiqing Li, David Acuna, Amlan Kar, Maria Shugrina, Xinkai Wei, Ming-Yu Liu, Antonio Torralba, Sanja Fidler

We propose Neural Turtle Graphics (NTG), a novel generative model for spatial graphs, and demonstrate its applications in modeling city road layouts.

A Theoretical Analysis of the Number of Shots in Few-Shot Learning

no code implementations ICLR 2020 Tianshi Cao, Marc Law, Sanja Fidler

We introduce a theoretical analysis of the impact of the shot number on Prototypical Networks, a state-of-the-art few-shot classification method.

Classification Few-Shot Learning +1

Video Face Clustering with Unknown Number of Clusters

1 code implementation ICCV 2019 Makarand Tapaswi, Marc T. Law, Sanja Fidler

Understanding videos such as TV series and movies requires analyzing who the characters are and what they are doing.

Clustering Face Clustering +1

Gated-SCNN: Gated Shape CNNs for Semantic Segmentation

4 code implementations ICCV 2019 Towaki Takikawa, David Acuna, Varun Jampani, Sanja Fidler

Here, we propose a new two-stream CNN architecture for semantic segmentation that explicitly wires shape information as a separate processing branch, i. e. shape stream, that processes information in parallel to the classical stream.

Image Segmentation Semantic Segmentation

Neural Graph Evolution: Towards Efficient Automatic Robot Design

1 code implementation12 Jun 2019 Tingwu Wang, Yuhao Zhou, Sanja Fidler, Jimmy Ba

To address the two challenges, we formulate automatic robot design as a graph search problem and perform evolution search in graph space.

EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis

1 code implementation15 May 2019 Chaoqi Wang, Roger Grosse, Sanja Fidler, Guodong Zhang

Reducing the test time resource requirements of a neural network while preserving test accuracy is crucial for running inference on resource-constrained devices.

Network Pruning

Neural Graph Evolution: Automatic Robot Design

no code implementations ICLR 2019 Tingwu Wang, Yuhao Zhou, Sanja Fidler, Jimmy Ba

To address the two challenges, we formulate automatic robot design as a graph search problem and perform evolution search in graph space.

Meta-Sim: Learning to Generate Synthetic Datasets

no code implementations ICCV 2019 Amlan Kar, Aayush Prakash, Ming-Yu Liu, Eric Cameracci, Justin Yuan, Matt Rusiniak, David Acuna, Antonio Torralba, Sanja Fidler

Training models to high-end performance requires availability of large labeled datasets, which are expensive to get.

Devil is in the Edges: Learning Semantic Boundaries from Noisy Annotations

1 code implementation CVPR 2019 David Acuna, Amlan Kar, Sanja Fidler

We further reason about true object boundaries during training using a level set formulation, which allows the network to learn from misaligned labels in an end-to-end fashion.

Semantic Segmentation

Mimicking the In-Camera Color Pipeline for Camera-Aware Object Compositing

no code implementations27 Mar 2019 Jun Gao, Xiao Li, Li-Wei Wang, Sanja Fidler, Stephen Lin

We present a method for compositing virtual objects into a photograph such that the object colors appear to have been processed by the photo's camera imaging pipeline.

Fast Interactive Object Annotation with Curve-GCN

2 code implementations CVPR 2019 Huan Ling, Jun Gao, Amlan Kar, Wenzheng Chen, Sanja Fidler

Our model runs at 29. 3ms in automatic, and 2. 6ms in interactive mode, making it 10x and 100x faster than Polygon-RNN++.

Object

ACTRCE: Augmenting Experience via Teacher's Advice For Multi-Goal Reinforcement Learning

no code implementations12 Feb 2019 Harris Chan, Yuhuai Wu, Jamie Kiros, Sanja Fidler, Jimmy Ba

We first analyze the differences among goal representation, and show that ACTRCE can efficiently solve difficult reinforcement learning problems in challenging 3D navigation tasks, whereas HER with non-language goal representation failed to learn.

Multi-Goal Reinforcement Learning reinforcement-learning +2

SurfConv: Bridging 3D and 2D Convolution for RGBD Images

1 code implementation CVPR 2018 Hang Chu, Wei-Chiu Ma, Kaustav Kundu, Raquel Urtasun, Sanja Fidler

On the other hand, 3D convolution wastes a large amount of memory on mostly unoccupied 3D space, which consists of only the surface visible to the sensor.

3D Semantic Segmentation

A Face-to-Face Neural Conversation Model

no code implementations CVPR 2018 Hang Chu, Daiqing Li, Sanja Fidler

The decoder consists of two layers, where the lower layer aims at generating the verbal response and coarse facial expressions, while the second layer fills in the subtle gestures, making the generated output more smooth and natural.

Decoder model

Learning to Caption Images through a Lifetime by Asking Questions

1 code implementation1 Dec 2018 Kevin Shen, Amlan Kar, Sanja Fidler

In order to bring artificial agents into our lives, we will need to go beyond supervised learning on closed datasets to having the ability to continuously expand knowledge.

Active Learning Image Captioning

A Neural Compositional Paradigm for Image Captioning

1 code implementation NeurIPS 2018 Bo Dai, Sanja Fidler, Dahua Lin

Mainstream captioning models often follow a sequential structure to generate captions, leading to issues such as introduction of irrelevant semantics, lack of diversity in the generated captions, and inadequate generalization performance.

Diversity Image Captioning

Pose Estimation for Objects with Rotational Symmetry

no code implementations13 Oct 2018 Enric Corona, Kaustav Kundu, Sanja Fidler

In particular, our aim is to infer poses for objects not seen at training time, but for which their 3D CAD models are available at test time.

Pose Estimation

VirtualHome: Simulating Household Activities via Programs

4 code implementations CVPR 2018 Xavier Puig, Kevin Ra, Marko Boben, Jiaman Li, Tingwu Wang, Sanja Fidler, Antonio Torralba

We then implement the most common atomic (inter)actions in the Unity3D game engine, and use our programs to "drive" an artificial agent to execute tasks in a simulated household environment.

Video Understanding

Color Sails: Discrete-Continuous Palettes for Deep Color Exploration

no code implementations7 Jun 2018 Maria Shugrina, Amlan Kar, Karan Singh, Sanja Fidler

Then, the user can adjust color sail parameters to change the base colors, their blending behavior and the number of colors, exploring a wide range of options for the original design.

Visual Reasoning by Progressive Module Networks

1 code implementation ICLR 2019 Seung Wook Kim, Makarand Tapaswi, Sanja Fidler

Thus, a module for a new task learns to query existing modules and composes their outputs in order to produce its own output.

Visual Reasoning

Now You Shake Me: Towards Automatic 4D Cinema

no code implementations CVPR 2018 Yuhao Zhou, Makarand Tapaswi, Sanja Fidler

We are interested in enabling automatic 4D cinema by parsing physical and special effects from untrimmed movies.

MovieGraphs: Towards Understanding Human-Centric Situations from Videos

no code implementations CVPR 2018 Paul Vicol, Makarand Tapaswi, Lluis Castrejon, Sanja Fidler

Towards this goal, we introduce a novel dataset called MovieGraphs which provides detailed, graph-based annotations of social situations depicted in movie clips.

Common Sense Reasoning

Be Your Own Prada: Fashion Synthesis with Structural Coherence

no code implementations ICCV 2017 Shizhan Zhu, Sanja Fidler, Raquel Urtasun, Dahua Lin, Chen Change Loy

In the second stage, a generative model with a newly proposed compositional mapping layer is used to render the final image with precise regions and textures conditioned on this map.

Fashion Synthesis Semantic Segmentation +1

SGN: Sequential Grouping Networks for Instance Segmentation

no code implementations ICCV 2017 Shu Liu, Jiaya Jia, Sanja Fidler, Raquel Urtasun

By exploiting two-directional information, the second network groups horizontal and vertical lines into connected components.

Instance Segmentation Object +1

3D Graph Neural Networks for RGBD Semantic Segmentation

2 code implementations ICCV 2017 Xiaojuan Qi, Renjie Liao, Jiaya Jia, Sanja Fidler, Raquel Urtasun

Each node in the graph corresponds to a set of points and is associated with a hidden representation vector initialized with an appearance feature extracted by a unary CNN from 2D images.

Ranked #38 on Semantic Segmentation on SUN-RGBD (using extra training data)

Graph Neural Network RGBD Semantic Segmentation +1

Sports Field Localization via Deep Structured Models

no code implementations CVPR 2017 Namdar Homayounfar, Sanja Fidler, Raquel Urtasun

In this work, we propose a novel way of efficiently localizing a sports field from a single broadcast image of the game.

Semantic Segmentation

Scene Parsing Through ADE20K Dataset

no code implementations CVPR 2017 Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, Antonio Torralba

A novel network design called Cascade Segmentation Module is proposed to parse a scene into stuff, objects, and object parts in a cascade and improve over the baselines.

Object Scene Parsing +1

Annotating Object Instances with a Polygon-RNN

2 code implementations CVPR 2017 Lluis Castrejon, Kaustav Kundu, Raquel Urtasun, Sanja Fidler

We show that our approach speeds up the annotation process by a factor of 4. 7 across all classes in Cityscapes, while achieving 78. 4% agreement in IoU with original ground-truth, matching the typical agreement between human annotators.

Object Segmentation +1

Open Vocabulary Scene Parsing

no code implementations ICCV 2017 Hang Zhao, Xavier Puig, Bolei Zhou, Sanja Fidler, Antonio Torralba

Recognizing arbitrary objects in the wild has been a challenging problem due to the limitations of existing classification models and datasets.

General Classification Scene Parsing

Towards Diverse and Natural Image Descriptions via a Conditional GAN

1 code implementation ICCV 2017 Bo Dai, Sanja Fidler, Raquel Urtasun, Dahua Lin

Despite the substantial progress in recent years, the image captioning techniques are still far from being perfect. Sentences produced by existing methods, e. g. those based on RNNs, are often overly rigid and lacking in variability.

Image Captioning Reinforcement Learning

Proximal Deep Structured Models

no code implementations NeurIPS 2016 Shenlong Wang, Sanja Fidler, Raquel Urtasun

Many problems in real-world applications involve predicting continuous-valued random variables that are statistically related.

Image Denoising Optical Flow Estimation

TorontoCity: Seeing the World with a Million Eyes

no code implementations ICCV 2017 Shenlong Wang, Min Bai, Gellert Mattyus, Hang Chu, Wenjie Luo, Bin Yang, Justin Liang, Joel Cheverie, Sanja Fidler, Raquel Urtasun

In this paper we introduce the TorontoCity benchmark, which covers the full greater Toronto area (GTA) with 712. 5 $km^2$ of land, 8439 $km$ of road and around 400, 000 buildings.

Instance Segmentation Semantic Segmentation

Efficient Summarization with Read-Again and Copy Mechanism

no code implementations10 Nov 2016 Wenyuan Zeng, Wenjie Luo, Sanja Fidler, Raquel Urtasun

Towards this goal, we first introduce a simple mechanism that first reads the input sequence before committing to a representation of each word.

Decoder

Semantic Understanding of Scenes through the ADE20K Dataset

22 code implementations18 Aug 2016 Bolei Zhou, Hang Zhao, Xavier Puig, Tete Xiao, Sanja Fidler, Adela Barriuso, Antonio Torralba

Scene parsing, or recognizing and segmenting objects and stuff in an image, is one of the key problems in computer vision.

Scene Parsing Segmentation +1

Find your Way by Observing the Sun and Other Semantic Cues

no code implementations23 Jun 2016 Wei-Chiu Ma, Shenlong Wang, Marcus A. Brubaker, Sanja Fidler, Raquel Urtasun

In this paper we present a robust, efficient and affordable approach to self-localization which does not require neither GPS nor knowledge about the appearance of the world.

HD Maps: Fine-Grained Road Segmentation by Parsing Ground and Aerial Images

no code implementations CVPR 2016 Gellert Mattyus, Shenlong Wang, Sanja Fidler, Raquel Urtasun

In this paper we present an approach to enhance existing maps with fine grained segmentation categories such as parking spots and sidewalk, as well as the number and location of road lanes.

Road Segmentation

Soccer Field Localization from a Single Image

no code implementations10 Apr 2016 Namdar Homayounfar, Sanja Fidler, Raquel Urtasun

In this work, we propose a novel way of efficiently localizing a soccer field from a single broadcast image of the game.

Instance-Level Segmentation for Autonomous Driving with Deep Densely Connected MRFs

no code implementations CVPR 2016 Ziyu Zhang, Sanja Fidler, Raquel Urtasun

Our aim is to provide a pixel-wise instance-level labeling of a monocular image in the context of autonomous driving.

Autonomous Driving

Enhancing Road Maps by Parsing Aerial Images Around the World

no code implementations ICCV 2015 Gellert Mattyus, Shenlong Wang, Sanja Fidler, Raquel Urtasun

In recent years, contextual models that exploit maps have been shown to be very effective for many recognition and localization tasks.

Semantic Segmentation

Lost Shopping! Monocular Localization in Large Indoor Spaces

no code implementations ICCV 2015 Shenlong Wang, Sanja Fidler, Raquel Urtasun

In this paper we propose a novel approach to localization in very large indoor spaces (i. e., 200+ store shopping malls) that takes a single image and a floor plan of the environment as input.

Text Detection Translation

Learning to Combine Mid-Level Cues for Object Proposal Generation

no code implementations ICCV 2015 Tom Lee, Sanja Fidler, Sven Dickinson

In this paper, we introduce Parametric Min-Loss (PML), a novel structured learning framework for parametric energy functions.

Object Proposal Generation Object Recognition

Order-Embeddings of Images and Language

2 code implementations19 Nov 2015 Ivan Vendrov, Ryan Kiros, Sanja Fidler, Raquel Urtasun

Hypernymy, textual entailment, and image captioning can be seen as special cases of a single visual-semantic hierarchy over words, sentences, and images.

Cross-Modal Retrieval Image Captioning +2

Cannot find the paper you are looking for? You can Submit a new open access paper.