Search Results for author: Sanja Fidler

Found 198 papers, 60 papers with code

L4GM: Large 4D Gaussian Reconstruction Model

no code implementations14 Jun 2024 Jiawei Ren, Kevin Xie, Ashkan Mirzaei, Hanxue Liang, Xiaohui Zeng, Karsten Kreis, Ziwei Liu, Antonio Torralba, Sanja Fidler, Seung Wook Kim, Huan Ling

We present L4GM, the first 4D Large Reconstruction Model that produces animated objects from a single-view video input -- in a single feed-forward pass that takes only a second.

NeRF-XL: Scaling NeRFs with Multiple GPUs

no code implementations24 Apr 2024 RuiLong Li, Sanja Fidler, Angjoo Kanazawa, Francis Williams

We present NeRF-XL, a principled method for distributing Neural Radiance Fields (NeRFs) across multiple GPUs, thus enabling the training and rendering of NeRFs with an arbitrarily large capacity.

Align Your Steps: Optimizing Sampling Schedules in Diffusion Models

no code implementations22 Apr 2024 Amirmojtaba Sabour, Sanja Fidler, Karsten Kreis

Diffusion models (DMs) have established themselves as the state-of-the-art generative modeling approach in the visual domain and beyond.

RefFusion: Reference Adapted Diffusion Models for 3D Scene Inpainting

no code implementations16 Apr 2024 Ashkan Mirzaei, Riccardo de Lutio, Seung Wook Kim, David Acuna, Jonathan Kelly, Sanja Fidler, Igor Gilitschenski, Zan Gojcic

In this work, we propose an approach for 3D scene inpainting -- the task of coherently replacing parts of the reconstructed scene with desired content.

3D Inpainting Image Inpainting

Can Feedback Enhance Semantic Grounding in Large Vision-Language Models?

no code implementations9 Apr 2024 Yuan-Hong Liao, Rafid Mahmood, Sanja Fidler, David Acuna

We find that if prompted appropriately, VLMs can utilize feedback both in a single step and iteratively, showcasing the potential of feedback as an alternative technique to improve grounding in internet-scale VLMs.

Augmented Reality based Simulated Data (ARSim) with multi-view consistency for AV perception networks

no code implementations22 Mar 2024 Aqeel Anwar, Tae Eun Choe, Zian Wang, Sanja Fidler, Minwoo Park

The resulting augmented multi-view consistent dataset is used to train a multi-camera perception network for autonomous vehicles.

Autonomous Driving Domain Adaptation

LATTE3D: Large-scale Amortized Text-To-Enhanced3D Synthesis

no code implementations22 Mar 2024 Kevin Xie, Jonathan Lorraine, Tianshi Cao, Jun Gao, James Lucas, Antonio Torralba, Sanja Fidler, Xiaohui Zeng

Recent text-to-3D generation approaches produce impressive 3D results but require time-consuming optimization that can take up to an hour per prompt.

3D Generation Text to 3D

EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models

no code implementations22 Jan 2024 Koichi Namekata, Amirmojtaba Sabour, Sanja Fidler, Seung Wook Kim

Diffusion models have recently received increasing research attention for their remarkable transfer abilities in semantic segmentation tasks.

Segmentation Semantic Segmentation

Compact Neural Graphics Primitives with Learned Hash Probing

no code implementations28 Dec 2023 Towaki Takikawa, Thomas Müller, Merlin Nimier-David, Alex Evans, Sanja Fidler, Alec Jacobson, Alexander Keller

Neural graphics primitives are faster and achieve higher quality when their neural networks are augmented by spatial data structures that hold trainable features arranged in a grid.

Quantization

Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models

no code implementations CVPR 2024 Huan Ling, Seung Wook Kim, Antonio Torralba, Sanja Fidler, Karsten Kreis

We also propose a motion amplification mechanism as well as a new autoregressive synthesis scheme to generate and combine multiple 4D sequences for longer generation.

Synthetic Data Generation Video Generation

Trajeglish: Traffic Modeling as Next-Token Prediction

no code implementations7 Dec 2023 Jonah Philion, Xue Bin Peng, Sanja Fidler

A longstanding challenge for self-driving development is simulating dynamic driving scenarios seeded from recorded driving logs.

Decoder

XCube ($\mathcal{X}^3$): Large-Scale 3D Generative Modeling using Sparse Voxel Hierarchies

no code implementations6 Dec 2023 Xuanchi Ren, Jiahui Huang, Xiaohui Zeng, Ken Museth, Sanja Fidler, Francis Williams

In addition to unconditional generation, we show that our model can be used to solve a variety of tasks such as user-guided editing, scene completion from a single scan, and text-to-3D.

3D Shape Generation Scene Generation +1

WildFusion: Learning 3D-Aware Latent Diffusion Models in View Space

no code implementations22 Nov 2023 Katja Schwarz, Seung Wook Kim, Jun Gao, Sanja Fidler, Andreas Geiger, Karsten Kreis

Then, we train a diffusion model in the 3D-aware latent space, thereby enabling synthesis of high-quality 3D-consistent image samples, outperforming recent state-of-the-art GAN-based methods.

3D-Aware Image Synthesis Depth Estimation +2

Adaptive Shells for Efficient Neural Radiance Field Rendering

no code implementations16 Nov 2023 Zian Wang, Tianchang Shen, Merlin Nimier-David, Nicholas Sharp, Jun Gao, Alexander Keller, Sanja Fidler, Thomas Müller, Zan Gojcic

We then extract an explicit mesh of a narrow band around the surface, with width determined by the kernel size, and fine-tune the radiance field within this band.

Novel View Synthesis Stochastic Optimization

3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features

no code implementations CVPR 2024 Chenfeng Xu, Huan Ling, Sanja Fidler, Or Litany

However, these features are initially trained on paired text and image data, which are not optimized for 3D tasks, and often exhibit a domain gap when applied to the target data.

3D Object Detection Novel View Synthesis +2

ViR: Towards Efficient Vision Retention Backbones

1 code implementation30 Oct 2023 Ali Hatamizadeh, Michael Ranzinger, Shiyi Lan, Jose M. Alvarez, Sanja Fidler, Jan Kautz

Inspired by this trend, we propose a new class of computer vision models, dubbed Vision Retention Networks (ViR), with dual parallel and recurrent formulations, which strike an optimal balance between fast inference and parallel training with competitive performance.

TexFusion: Synthesizing 3D Textures with Text-Guided Image Diffusion Models

no code implementations ICCV 2023 Tianshi Cao, Karsten Kreis, Sanja Fidler, Nicholas Sharp, Kangxue Yin

We present TexFusion (Texture Diffusion), a new method to synthesize textures for given 3D geometries, using large-scale text-guided image diffusion models.

Denoising Game Design +1

Towards Viewpoint Robustness in Bird's Eye View Segmentation

no code implementations ICCV 2023 Tzofi Klinghoffer, Jonah Philion, Wenzheng Chen, Or Litany, Zan Gojcic, Jungseock Joo, Ramesh Raskar, Sanja Fidler, Jose M. Alvarez

We introduce a technique for novel view synthesis and use it to transform collected data to the viewpoint of target rigs, allowing us to train BEV segmentation models for diverse target rigs without any additional data collection or labeling cost.

Autonomous Vehicles Novel View Synthesis

Flexible Isosurface Extraction for Gradient-Based Mesh Optimization

no code implementations10 Aug 2023 Tianchang Shen, Jacob Munkberg, Jon Hasselgren, Kangxue Yin, Zian Wang, Wenzheng Chen, Zan Gojcic, Sanja Fidler, Nicholas Sharp, Jun Gao

This work considers gradient-based mesh optimization, where we iteratively optimize for a 3D surface mesh by representing it as the isosurface of a scalar field, an increasingly common paradigm in applications including photogrammetry, generative modeling, and inverse physics.

DreamTeacher: Pretraining Image Backbones with Deep Generative Models

no code implementations ICCV 2023 Daiqing Li, Huan Ling, Amlan Kar, David Acuna, Seung Wook Kim, Karsten Kreis, Antonio Torralba, Sanja Fidler

In this work, we introduce a self-supervised feature representation learning framework DreamTeacher that utilizes generative networks for pre-training downstream image backbones.

Knowledge Distillation Representation Learning

ATT3D: Amortized Text-to-3D Object Synthesis

no code implementations ICCV 2023 Jonathan Lorraine, Kevin Xie, Xiaohui Zeng, Chen-Hsuan Lin, Towaki Takikawa, Nicholas Sharp, Tsung-Yi Lin, Ming-Yu Liu, Sanja Fidler, James Lucas

Text-to-3D modelling has seen exciting progress by combining generative text-to-image models with image-to-3D methods like Neural Radiance Fields.

Image to 3D Object +1

Neural Kernel Surface Reconstruction

no code implementations CVPR 2023 Jiahui Huang, Zan Gojcic, Matan Atzmon, Or Litany, Sanja Fidler, Francis Williams

We present a novel method for reconstructing a 3D implicit surface from a large-scale, sparse, and noisy point cloud.

Surface Reconstruction

Neural LiDAR Fields for Novel View Synthesis

no code implementations ICCV 2023 Shengyu Huang, Zan Gojcic, Zian Wang, Francis Williams, Yoni Kasten, Sanja Fidler, Konrad Schindler, Or Litany

We present Neural Fields for LiDAR (NFL), a method to optimise a neural field scene representation from LiDAR measurements, with the goal of synthesizing realistic LiDAR scans from novel viewpoints.

Novel LiDAR View Synthesis Semantic Segmentation

NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models

no code implementations CVPR 2023 Seung Wook Kim, Bradley Brown, Kangxue Yin, Karsten Kreis, Katja Schwarz, Daiqing Li, Robin Rombach, Antonio Torralba, Sanja Fidler

We first train a scene auto-encoder to express a set of image and pose pairs as a neural field, represented as density and feature voxel grids that can be projected to produce novel views of the scene.

Scene Generation

Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models

3 code implementations CVPR 2023 Andreas Blattmann, Robin Rombach, Huan Ling, Tim Dockhorn, Seung Wook Kim, Sanja Fidler, Karsten Kreis

We first pre-train an LDM on images only; then, we turn the image generator into a video generator by introducing a temporal dimension to the latent space diffusion model and fine-tuning on encoded image sequences, i. e., videos.

Ranked #5 on Text-to-Video Generation on MSR-VTT (CLIP-FID metric)

Image Generation Text-to-Video Generation +3

Neural Fields meet Explicit Geometric Representation for Inverse Rendering of Urban Scenes

no code implementations6 Apr 2023 Zian Wang, Tianchang Shen, Jun Gao, Shengyu Huang, Jacob Munkberg, Jon Hasselgren, Zan Gojcic, Wenzheng Chen, Sanja Fidler

Reconstruction and intrinsic decomposition of scenes from captured imagery would enable many applications such as relighting and virtual object insertion.

3D Reconstruction Inverse Rendering

Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion

no code implementations CVPR 2023 Davis Rempe, Zhengyi Luo, Xue Bin Peng, Ye Yuan, Kris Kitani, Karsten Kreis, Sanja Fidler, Or Litany

We introduce a method for generating realistic pedestrian trajectories and full-body animations that can be controlled to meet user-defined goals.

Collision Avoidance

Bridging the Sim2Real gap with CARE: Supervised Detection Adaptation with Conditional Alignment and Reweighting

no code implementations9 Feb 2023 Viraj Prabhu, David Acuna, Andrew Liao, Rafid Mahmood, Marc T. Law, Judy Hoffman, Sanja Fidler, James Lucas

Sim2Real domain adaptation (DA) research focuses on the constrained setting of adapting from a labeled synthetic source domain to an unlabeled or sparsely labeled real target domain.

Autonomous Driving Domain Adaptation +3

Synthesizing Physical Character-Scene Interactions

no code implementations2 Feb 2023 Mohamed Hassan, Yunrong Guo, Tingwu Wang, Michael Black, Sanja Fidler, Xue Bin Peng

These scene interactions are learned using an adversarial discriminator that evaluates the realism of a motion within the context of a scene.

Imitation Learning

PADL: Language-Directed Physics-Based Character Control

1 code implementation31 Jan 2023 Jordan Juravsky, Yunrong Guo, Sanja Fidler, Xue Bin Peng

In this work, we present PADL, which leverages recent innovations in NLP in order to take steps towards developing language-directed controllers for physics-based character animation.

Image Generation Imitation Learning +3

Learning Human Dynamics in Autonomous Driving Scenarios

no code implementations ICCV 2023 Jingbo Wang, Ye Yuan, Zhengyi Luo, Kevin Xie, Dahua Lin, Umar Iqbal, Sanja Fidler, Sameh Khamis

In this work, we propose a holistic framework for learning physically plausible human dynamics from real driving scenarios, narrowing the gap between real and simulated human behavior in safety-critical applications.

Autonomous Driving Human Dynamics

Magic3D: High-Resolution Text-to-3D Content Creation

1 code implementation CVPR 2023 Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, Tsung-Yi Lin

DreamFusion has recently demonstrated the utility of a pre-trained text-to-image diffusion model to optimize Neural Radiance Fields (NeRF), achieving remarkable text-to-3D synthesis results.

Text to 3D Vocal Bursts Intensity Prediction

LION: Latent Point Diffusion Models for 3D Shape Generation

2 code implementations12 Oct 2022 Xiaohui Zeng, Arash Vahdat, Francis Williams, Zan Gojcic, Or Litany, Sanja Fidler, Karsten Kreis

To advance 3D DDMs and make them useful for digital artists, we require (i) high generation quality, (ii) flexibility for manipulation and applications such as conditional synthesis and shape interpolation, and (iii) the ability to output smooth surfaces or meshes.

3D Generation 3D Shape Generation +3

XDGAN: Multi-Modal 3D Shape Generation in 2D Space

no code implementations6 Oct 2022 Hassan Abu Alhaija, Alara Dirik, André Knörig, Sanja Fidler, Maria Shugrina

Specifically, we propose a novel method to convert 3D shapes into compact 1-channel geometry images and leverage StyleGAN3 and image-to-image translation networks to generate 3D objects in 2D space.

3D Shape Generation Image-to-Image Translation

Optimizing Data Collection for Machine Learning

no code implementations3 Oct 2022 Rafid Mahmood, James Lucas, Jose M. Alvarez, Sanja Fidler, Marc T. Law

Modern deep learning systems require huge data sets to achieve impressive performance, but there is little guidance on how much or what kind of data to collect.

EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations

3 code implementations26 Sep 2022 Ahmad Darkhalil, Dandan Shan, Bin Zhu, Jian Ma, Amlan Kar, Richard Higgins, Sanja Fidler, David Fouhey, Dima Damen

VISOR annotates videos from EPIC-KITCHENS, which comes with a new set of challenges not encountered in current video segmentation datasets.

Object Segmentation +4

GET3D: A Generative Model of High Quality 3D Textured Shapes Learned from Images

3 code implementations22 Sep 2022 Jun Gao, Tianchang Shen, Zian Wang, Wenzheng Chen, Kangxue Yin, Daiqing Li, Or Litany, Zan Gojcic, Sanja Fidler

As several industries are moving towards modeling massive 3D virtual worlds, the need for content creation tools that can scale in terms of the quantity, quality, and diversity of 3D content is becoming evident.

Neural Light Field Estimation for Street Scenes with Differentiable Virtual Object Insertion

no code implementations19 Aug 2022 Zian Wang, Wenzheng Chen, David Acuna, Jan Kautz, Sanja Fidler

In this work, we propose a neural approach that estimates the 5D HDR light field from a single image, and a differentiable object insertion formulation that enables end-to-end training with image-based losses that encourage realism.

Autonomous Driving Lighting Estimation +1

MvDeCor: Multi-view Dense Correspondence Learning for Fine-grained 3D Segmentation

3 code implementations18 Aug 2022 Gopal Sharma, Kangxue Yin, Subhransu Maji, Evangelos Kalogerakis, Or Litany, Sanja Fidler

As a result, the learned 2D representations are view-invariant and geometrically consistent, leading to better generalization when trained on a limited number of labeled shapes compared to alternatives that utilize self-supervision in 2D or 3D alone.

Contrastive Learning Segmentation

Improving Semantic Segmentation in Transformers using Hierarchical Inter-Level Attention

no code implementations5 Jul 2022 Gary Leung, Jun Gao, Xiaohui Zeng, Sanja Fidler

HILA extends hierarchical vision transformer architectures by adding local connections between features of higher and lower levels to the backbone encoder.

Object Semantic Segmentation

Scalable Neural Data Server: A Data Recommender for Transfer Learning

no code implementations NeurIPS 2021 Tianshi Cao, Sasha Doubov, David Acuna, Sanja Fidler

NDS uses a mixture of experts trained on data sources to estimate similarity between each source and the downstream task.

Transfer Learning

Variable Bitrate Neural Fields

1 code implementation15 Jun 2022 Towaki Takikawa, Alex Evans, Jonathan Tremblay, Thomas Müller, Morgan McGuire, Alec Jacobson, Sanja Fidler

Neural approximations of scalar and vector fields, such as signed distance functions and radiance fields, have emerged as accurate, high-quality representations.

Decoder

ASE: Large-Scale Reusable Adversarial Skill Embeddings for Physically Simulated Characters

no code implementations4 May 2022 Xue Bin Peng, Yunrong Guo, Lina Halper, Sergey Levine, Sanja Fidler

By leveraging a massively parallel GPU-based simulator, we are able to train skill embeddings using over a decade of simulated experiences, enabling our model to learn a rich and versatile repertoire of skills.

Imitation Learning Unsupervised Reinforcement Learning

M$^2$BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified Birds-Eye View Representation

no code implementations11 Apr 2022 Enze Xie, Zhiding Yu, Daquan Zhou, Jonah Philion, Anima Anandkumar, Sanja Fidler, Ping Luo, Jose M. Alvarez

In this paper, we propose M$^2$BEV, a unified framework that jointly performs 3D object detection and map segmentation in the Birds Eye View~(BEV) space with multi-camera image inputs.

3D Object Detection object-detection +1

AUV-Net: Learning Aligned UV Maps for Texture Transfer and Synthesis

no code implementations CVPR 2022 Zhiqin Chen, Kangxue Yin, Sanja Fidler

In this paper, we address the problem of texture representation for 3D shapes for the challenging and underexplored tasks of texture transfer and synthesis.

3D Reconstruction Single-View 3D Reconstruction +1

Learning Smooth Neural Functions via Lipschitz Regularization

no code implementations16 Feb 2022 Hsueh-Ti Derek Liu, Francis Williams, Alec Jacobson, Sanja Fidler, Or Litany

The latent descriptor of a neural field acts as a deformation handle for the 3D shape it represents.

Domain Adversarial Training: A Game Perspective

no code implementations ICLR 2022 David Acuna, Marc T Law, Guojun Zhang, Sanja Fidler

Defining optimal solutions in domain-adversarial training as a local Nash equilibrium, we show that gradient descent in domain-adversarial training can violate the asymptotic convergence guarantees of the optimizer, oftentimes hindering the transfer performance.

Domain Adaptation

Causal Scene BERT: Improving object detection by searching for challenging groups of data

no code implementations8 Feb 2022 Cinjon Resnick, Or Litany, Amlan Kar, Karsten Kreis, James Lucas, Kyunghyun Cho, Sanja Fidler

Our main contribution is a pseudo-automatic method to discover such groups in foresight by performing causal interventions on simulated scenes.

Autonomous Vehicles object-detection +1

Federated Learning with Heterogeneous Architectures using Graph HyperNetworks

no code implementations20 Jan 2022 Or Litany, Haggai Maron, David Acuna, Jan Kautz, Gal Chechik, Sanja Fidler

Standard Federated Learning (FL) techniques are limited to clients with identical network architectures.

Federated Learning

Generating Useful Accident-Prone Driving Scenarios via a Learned Traffic Prior

no code implementations CVPR 2022 Davis Rempe, Jonah Philion, Leonidas J. Guibas, Sanja Fidler, Or Litany

Scenario generation is formulated as an optimization in the latent space of this traffic model, perturbing an initial real-world scene to produce trajectories that collide with a given planner.

Autonomous Vehicles

Frame Averaging for Equivariant Shape Space Learning

no code implementations CVPR 2022 Matan Atzmon, Koki Nagano, Sanja Fidler, Sameh Khamis, Yaron Lipman

A natural way to incorporate symmetries in shape space learning is to ask that the mapping to the shape space (encoder) and mapping from the shape space (decoder) are equivariant to the relevant symmetries.

Don’t Generate Me: Training Differentially Private Generative Models with Sinkhorn Divergence

no code implementations NeurIPS 2021 Tianshi Cao, Alex Bie, Arash Vahdat, Sanja Fidler, Karsten Kreis

Generative models trained with privacy constraints on private data can sidestep this challenge, providing indirect access to private data instead.

Towards Optimal Strategies for Training Self-Driving Perception Models in Simulation

no code implementations NeurIPS 2021 David Acuna, Jonah Philion, Sanja Fidler

Alternative solutions seek to exploit driving simulators that can generate large amounts of labeled data with a plethora of content variations.

Autonomous Driving Domain Adaptation

Deep Marching Tetrahedra: a Hybrid Representation for High-Resolution 3D Shape Synthesis

no code implementations NeurIPS 2021 Tianchang Shen, Jun Gao, Kangxue Yin, Ming-Yu Liu, Sanja Fidler

The core of DMTet includes a deformable tetrahedral grid that encodes a discretized signed distance function and a differentiable marching tetrahedra layer that converts the implicit signed distance representation to the explicit surface mesh representation.

EditGAN: High-Precision Semantic Image Editing

1 code implementation NeurIPS 2021 Huan Ling, Karsten Kreis, Daiqing Li, Seung Wook Kim, Antonio Torralba, Sanja Fidler

EditGAN builds on a GAN framework that jointly models images and their semantic segmentations, requiring only a handful of labeled examples, making it a scalable tool for editing.

Segmentation Semantic Segmentation +1

Don't Generate Me: Training Differentially Private Generative Models with Sinkhorn Divergence

1 code implementation1 Nov 2021 Tianshi Cao, Alex Bie, Arash Vahdat, Sanja Fidler, Karsten Kreis

Generative models trained with privacy constraints on private data can sidestep this challenge, providing indirect access to private data instead.

ATISS: Autoregressive Transformers for Indoor Scene Synthesis

1 code implementation NeurIPS 2021 Despoina Paschalidou, Amlan Kar, Maria Shugrina, Karsten Kreis, Andreas Geiger, Sanja Fidler

The ability to synthesize realistic and diverse indoor furniture layouts automatically or based on partial input, unlocks many applications, from better interactive 3D tools to data synthesis for training and simulation.

2D Semantic Segmentation task 1 (8 classes) 3D Semantic Scene Completion +1

Causal Scene BERT: Improving object detection by searching for challenging groups

no code implementations29 Sep 2021 Cinjon Resnick, Or Litany, Amlan Kar, Karsten Kreis, James Lucas, Kyunghyun Cho, Sanja Fidler

We verify that the prioritized groups found via intervention are challenging for the object detector and show that retraining with data collected from these groups helps inordinately compared to adding more IID data.

Autonomous Vehicles object-detection +1

Low-Budget Active Learning via Wasserstein Distance: An Integer Programming Approach

no code implementations ICLR 2022 Rafid Mahmood, Sanja Fidler, Marc T Law

Active learning is the process of training a model with limited labeled data by selecting a core subset of an unlabeled data pool to label.

Active Learning

Physics-based Human Motion Estimation and Synthesis from Videos

no code implementations ICCV 2021 Kevin Xie, Tingwu Wang, Umar Iqbal, Yunrong Guo, Sanja Fidler, Florian Shkurti

By enabling learning of motion synthesis from video, our method paves the way for large-scale, realistic and diverse motion synthesis.

Motion Estimation Motion Synthesis +1

3DStyleNet: Creating 3D Shapes with Geometric and Texture Style Variations

no code implementations ICCV 2021 Kangxue Yin, Jun Gao, Maria Shugrina, Sameh Khamis, Sanja Fidler

Given a small set of high-quality textured objects, our method can create many novel stylized shapes, resulting in effortless 3D content creation and style-ware data augmentation.

3D Reconstruction Data Augmentation +1

NP-DRAW: A Non-Parametric Structured Latent Variable Model for Image Generation

1 code implementation25 Jun 2021 Xiaohui Zeng, Raquel Urtasun, Richard Zemel, Sanja Fidler, Renjie Liao

1) We propose a non-parametric prior distribution over the appearance of image parts so that the latent variable ``what-to-draw'' per step becomes a categorical random variable.

Image Generation

f-Domain-Adversarial Learning: Theory and Algorithms

1 code implementation21 Jun 2021 David Acuna, Guojun Zhang, Marc T. Law, Sanja Fidler

Unsupervised domain adaptation is used in many machine learning applications where, during training, a model has access to unlabeled data in the target domain, and a related labeled dataset.

Learning Theory Unsupervised Domain Adaptation

Low Budget Active Learning via Wasserstein Distance: An Integer Programming Approach

no code implementations5 Jun 2021 Rafid Mahmood, Sanja Fidler, Marc T. Law

Active learning is the process of training a model with limited labeled data by selecting a core subset of an unlabeled data pool to label.

Active Learning

DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort

2 code implementations CVPR 2021 Yuxuan Zhang, Huan Ling, Jun Gao, Kangxue Yin, Jean-Francois Lafleche, Adela Barriuso, Antonio Torralba, Sanja Fidler

To showcase the power of our approach, we generated datasets for 7 image segmentation tasks which include pixel-level labels for 34 human face parts, and 32 car parts.

Decoder Image Segmentation +1

Image-Level or Object-Level? A Tale of Two Resampling Strategies for Long-Tailed Detection

1 code implementation12 Apr 2021 Nadine Chang, Zhiding Yu, Yu-Xiong Wang, Anima Anandkumar, Sanja Fidler, Jose M. Alvarez

As a result, image resampling alone is not enough to yield a sufficiently balanced distribution at the object level.

Object

Neural Parts: Learning Expressive 3D Shape Abstractions with Invertible Neural Networks

1 code implementation CVPR 2021 Despoina Paschalidou, Angelos Katharopoulos, Andreas Geiger, Sanja Fidler

The INN allows us to compute the inverse mapping of the homeomorphism, which in turn, enables the efficient computation of both the implicit surface function of a primitive and its mesh, without any additional post-processing.

Neural Geometric Level of Detail: Real-time Rendering with Implicit 3D Shapes

2 code implementations CVPR 2021 Towaki Takikawa, Joey Litalien, Kangxue Yin, Karsten Kreis, Charles Loop, Derek Nowrouzezahrai, Alec Jacobson, Morgan McGuire, Sanja Fidler

We introduce an efficient neural representation that, for the first time, enables real-time rendering of high-fidelity neural SDFs, while achieving state-of-the-art geometry reconstruction quality.

f-Domain-Adversarial Learning: Theory and Algorithms for Unsupervised Domain Adaptation with Neural Networks

no code implementations1 Jan 2021 David Acuna, Guojun Zhang, Marc T Law, Sanja Fidler

We provide empirical results for several f-divergences and show that some, not considered previously in domain-adversarial learning, achieve state-of-the-art results in practice.

Generalization Bounds Learning Theory +1

Differentially Private Generative Models Through Optimal Transport

no code implementations1 Jan 2021 Tianshi Cao, Alex Bie, Karsten Kreis, Sanja Fidler

Generative models trained with privacy constraints on private data can sidestep this challenge and provide indirect access to the private data instead.

Personalized Federated Learning with First Order Model Optimization

3 code implementations ICLR 2021 Michael Zhang, Karan Sapra, Sanja Fidler, Serena Yeung, Jose M. Alvarez

While federated learning traditionally aims to train a single global model across decentralized local datasets, one model may not always be ideal for all participating clients.

Model Optimization Personalized Federated Learning

Variational Amodal Object Completion

no code implementations NeurIPS 2020 Huan Ling, David Acuna, Karsten Kreis, Seung Wook Kim, Sanja Fidler

In images of complex scenes, objects are often occluding each other which makes perception tasks such as object detection and tracking, or robotic control tasks such as planning, challenging.

Object object-detection +1

UniCon: Universal Neural Controller For Physics-based Character Motion

no code implementations30 Nov 2020 Tingwu Wang, Yunrong Guo, Maria Shugrina, Sanja Fidler

The field of physics-based animation is gaining importance due to the increasing demand for realism in video games and films, and has recently seen wide adoption of data-driven techniques, such as deep reinforcement learning (RL), which learn control from (human) demonstrations.

Reinforcement Learning (RL)

Emergent Road Rules In Multi-Agent Driving Environments

1 code implementation ICLR 2021 Avik Pal, Jonah Philion, Yuan-Hong Liao, Sanja Fidler

For autonomous vehicles to safely share the road with human drivers, autonomous vehicles must abide by specific "road rules" that human drivers have agreed to follow.

Autonomous Vehicles

Learning Deformable Tetrahedral Meshes for 3D Reconstruction

1 code implementation NeurIPS 2020 Jun Gao, Wenzheng Chen, Tommy Xiang, Clement Fuji Tsang, Alec Jacobson, Morgan McGuire, Sanja Fidler

We introduce Deformable Tetrahedral Meshes (DefTet) as a particular parameterization that utilizes volumetric tetrahedral meshes for the reconstruction problem.

3D Reconstruction

Image GANs meet Differentiable Rendering for Inverse Graphics and Interpretable 3D Neural Rendering

no code implementations ICLR 2021 Yuxuan Zhang, Wenzheng Chen, Huan Ling, Jun Gao, Yinan Zhang, Antonio Torralba, Sanja Fidler

Key to our approach is to exploit GANs as a multi-view data generator to train an inverse graphics network using an off-the-shelf differentiable renderer, and the trained inverse graphics network as a teacher to disentangle the GAN's latent code into interpretable 3D properties.

Neural Rendering

Fed-Sim: Federated Simulation for Medical Imaging

no code implementations1 Sep 2020 Daiqing Li, Amlan Kar, Nishant Ravikumar, Alejandro F. Frangi, Sanja Fidler

Since the model of geometry and material is disentangled from the imaging sensor, it can effectively be trained across multiple medical centers.

Federated Learning

Expressive Telepresence via Modular Codec Avatars

no code implementations ECCV 2020 Hang Chu, Shugao Ma, Fernando de la Torre, Sanja Fidler, Yaser Sheikh

It is important to note that traditional person-specific CAs are learned from few training samples, and typically lack robustness as well as limited expressiveness when transferring facial expressions.

Interactive Annotation of 3D Object Geometry using 2D Scribbles

no code implementations ECCV 2020 Tianchang Shen, Jun Gao, Amlan Kar, Sanja Fidler

We implement our framework as a web service and conduct a user study, where we show that user annotated data using our method effectively facilitates real-world learning tasks.

ScribbleBox: Interactive Annotation Framework for Video Object Segmentation

no code implementations ECCV 2020 Bo-Wen Chen, Huan Ling, Xiaohui Zeng, Gao Jun, Ziyue Xu, Sanja Fidler

Our approach tolerates a modest amount of noise in the box placements, thus typically only a few clicks are needed to annotate tracked boxes to a sufficient accuracy.

Object Segmentation +3

Beyond Fixed Grid: Learning Geometric Image Representation with a Deformable Grid

no code implementations ECCV 2020 Jun Gao, Zian Wang, Jinchen Xuan, Sanja Fidler

We also utilize DefGrid at the output layers for the task of object mask annotation, and show that reasoning about object boundaries on our predicted polygonal grid leads to more accurate results over existing pixel-wise and curve-based approaches.

Semantic Segmentation

Meta-Sim2: Unsupervised Learning of Scene Structure for Synthetic Data Generation

no code implementations ECCV 2020 Jeevan Devaranjan, Amlan Kar, Sanja Fidler

In Meta-Sim2, we aim to learn the scene structure in addition to parameters, which is a challenging problem due to its discrete nature.

Synthetic Data Generation valid

Learning to Generate Diverse Dance Motions with Transformer

no code implementations18 Aug 2020 Jiaman Li, Yihang Yin, Hang Chu, Yi Zhou, Tingwu Wang, Sanja Fidler, Hao Li

We also introduce new evaluation metrics for the quality of synthesized dance motions, and demonstrate that our system can outperform state-of-the-art methods.

Motion Synthesis

Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by Implicitly Unprojecting to 3D

1 code implementation ECCV 2020 Jonah Philion, Sanja Fidler

By training on the entire camera rig, we provide evidence that our model is able to learn not only how to represent images but how to fuse predictions from all cameras into a single cohesive representation of the scene while being robust to calibration error.

Ranked #6 on Bird's-Eye View Semantic Segmentation on nuScenes (IoU ped - 224x480 - Vis filter. - 100x100 at 0.5 metric)

Autonomous Vehicles Bird's-Eye View Semantic Segmentation +1

Efficient and Information-Preserving Future Frame Prediction and Beyond

1 code implementation ICLR 2020 Wei Yu, Yichao Lu, Steve Easterbrook, Sanja Fidler

Applying resolution-preserving blocks is a common practice to maximize information preservation in video prediction, yet their high memory consumption greatly limits their application scenarios.

Computational Efficiency object-detection +3

The EPIC-KITCHENS Dataset: Collection, Challenges and Baselines

2 code implementations29 Apr 2020 Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Sanja Fidler, Antonino Furnari, Evangelos Kazakos, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, Michael Wray

Our dataset features 55 hours of video consisting of 11. 5M frames, which we densely labelled for a total of 39. 6K action segments and 454. 2K object bounding boxes.

Object

Learning to Evaluate Perception Models Using Planner-Centric Metrics

no code implementations CVPR 2020 Jonah Philion, Amlan Kar, Sanja Fidler

The downside of these metrics is that, at worst, they penalize all incorrect detections equally without conditioning on the task or scene, and at best, heuristics need to be chosen to ensure that different mistakes count differently.

3D Object Detection object-detection

Neural Data Server: A Large-Scale Search Engine for Transfer Learning Data

no code implementations CVPR 2020 Xi Yan, David Acuna, Sanja Fidler

NDS consists of a dataserver which indexes several large popular image datasets, and aims to recommend data to a client, an end-user with a target application with its own small labeled dataset.

Image Classification Instance Segmentation +4

The Shmoop Corpus: A Dataset of Stories with Loosely Aligned Summaries

1 code implementation30 Dec 2019 Atef Chaudhury, Makarand Tapaswi, Seung Wook Kim, Sanja Fidler

Understanding stories is a challenging reading comprehension problem for machines as it requires reading a large volume of text and following long-range dependencies.

Abstractive Text Summarization Question Answering +1

CrevNet: Conditionally Reversible Video Prediction

no code implementations25 Oct 2019 Wei Yu, Yichao Lu, Steve Easterbrook, Sanja Fidler

Applying resolution-preserving blocks is a common practice to maximize information preservation in video prediction, yet their high memory consumption greatly limits their application scenarios.

Computational Efficiency Video Prediction

Neural Turtle Graphics for Modeling City Road Layouts

no code implementations ICCV 2019 Hang Chu, Daiqing Li, David Acuna, Amlan Kar, Maria Shugrina, Xinkai Wei, Ming-Yu Liu, Antonio Torralba, Sanja Fidler

We propose Neural Turtle Graphics (NTG), a novel generative model for spatial graphs, and demonstrate its applications in modeling city road layouts.

A Theoretical Analysis of the Number of Shots in Few-Shot Learning

no code implementations ICLR 2020 Tianshi Cao, Marc Law, Sanja Fidler

We introduce a theoretical analysis of the impact of the shot number on Prototypical Networks, a state-of-the-art few-shot classification method.

Classification Few-Shot Learning +1

Video Face Clustering with Unknown Number of Clusters

1 code implementation ICCV 2019 Makarand Tapaswi, Marc T. Law, Sanja Fidler

Understanding videos such as TV series and movies requires analyzing who the characters are and what they are doing.

Clustering Face Clustering +1

Learning to Predict 3D Objects with an Interpolation-based Differentiable Renderer

1 code implementation NeurIPS 2019 Wenzheng Chen, Jun Gao, Huan Ling, Edward J. Smith, Jaakko Lehtinen, Alec Jacobson, Sanja Fidler

Many machine learning models operate on images, but ignore the fact that images are 2D projections formed by 3D geometry interacting with light, in a process called rendering.

Single-View 3D Reconstruction

Gated-SCNN: Gated Shape CNNs for Semantic Segmentation

4 code implementations ICCV 2019 Towaki Takikawa, David Acuna, Varun Jampani, Sanja Fidler

Here, we propose a new two-stream CNN architecture for semantic segmentation that explicitly wires shape information as a separate processing branch, i. e. shape stream, that processes information in parallel to the classical stream.

Image Segmentation Semantic Segmentation

Neural Graph Evolution: Towards Efficient Automatic Robot Design

1 code implementation12 Jun 2019 Tingwu Wang, Yuhao Zhou, Sanja Fidler, Jimmy Ba

To address the two challenges, we formulate automatic robot design as a graph search problem and perform evolution search in graph space.

EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis

1 code implementation15 May 2019 Chaoqi Wang, Roger Grosse, Sanja Fidler, Guodong Zhang

Reducing the test time resource requirements of a neural network while preserving test accuracy is crucial for running inference on resource-constrained devices.

Network Pruning

Neural Graph Evolution: Automatic Robot Design

no code implementations ICLR 2019 Tingwu Wang, Yuhao Zhou, Sanja Fidler, Jimmy Ba

To address the two challenges, we formulate automatic robot design as a graph search problem and perform evolution search in graph space.

Meta-Sim: Learning to Generate Synthetic Datasets

no code implementations ICCV 2019 Amlan Kar, Aayush Prakash, Ming-Yu Liu, Eric Cameracci, Justin Yuan, Matt Rusiniak, David Acuna, Antonio Torralba, Sanja Fidler

Training models to high-end performance requires availability of large labeled datasets, which are expensive to get.

Devil is in the Edges: Learning Semantic Boundaries from Noisy Annotations

1 code implementation CVPR 2019 David Acuna, Amlan Kar, Sanja Fidler

We further reason about true object boundaries during training using a level set formulation, which allows the network to learn from misaligned labels in an end-to-end fashion.

Semantic Segmentation

Mimicking the In-Camera Color Pipeline for Camera-Aware Object Compositing

no code implementations27 Mar 2019 Jun Gao, Xiao Li, Li-Wei Wang, Sanja Fidler, Stephen Lin

We present a method for compositing virtual objects into a photograph such that the object colors appear to have been processed by the photo's camera imaging pipeline.

Fast Interactive Object Annotation with Curve-GCN

2 code implementations CVPR 2019 Huan Ling, Jun Gao, Amlan Kar, Wenzheng Chen, Sanja Fidler

Our model runs at 29. 3ms in automatic, and 2. 6ms in interactive mode, making it 10x and 100x faster than Polygon-RNN++.

Object

ACTRCE: Augmenting Experience via Teacher's Advice For Multi-Goal Reinforcement Learning

no code implementations12 Feb 2019 Harris Chan, Yuhuai Wu, Jamie Kiros, Sanja Fidler, Jimmy Ba

We first analyze the differences among goal representation, and show that ACTRCE can efficiently solve difficult reinforcement learning problems in challenging 3D navigation tasks, whereas HER with non-language goal representation failed to learn.

Multi-Goal Reinforcement Learning reinforcement-learning +1

A Face-to-Face Neural Conversation Model

no code implementations CVPR 2018 Hang Chu, Daiqing Li, Sanja Fidler

The decoder consists of two layers, where the lower layer aims at generating the verbal response and coarse facial expressions, while the second layer fills in the subtle gestures, making the generated output more smooth and natural.

Decoder

SurfConv: Bridging 3D and 2D Convolution for RGBD Images

1 code implementation CVPR 2018 Hang Chu, Wei-Chiu Ma, Kaustav Kundu, Raquel Urtasun, Sanja Fidler

On the other hand, 3D convolution wastes a large amount of memory on mostly unoccupied 3D space, which consists of only the surface visible to the sensor.

3D Semantic Segmentation

Learning to Caption Images through a Lifetime by Asking Questions

1 code implementation1 Dec 2018 Kevin Shen, Amlan Kar, Sanja Fidler

In order to bring artificial agents into our lives, we will need to go beyond supervised learning on closed datasets to having the ability to continuously expand knowledge.

Active Learning Image Captioning

A Neural Compositional Paradigm for Image Captioning

1 code implementation NeurIPS 2018 Bo Dai, Sanja Fidler, Dahua Lin

Mainstream captioning models often follow a sequential structure to generate captions, leading to issues such as introduction of irrelevant semantics, lack of diversity in the generated captions, and inadequate generalization performance.

Image Captioning

Pose Estimation for Objects with Rotational Symmetry

no code implementations13 Oct 2018 Enric Corona, Kaustav Kundu, Sanja Fidler

In particular, our aim is to infer poses for objects not seen at training time, but for which their 3D CAD models are available at test time.

Pose Estimation

VirtualHome: Simulating Household Activities via Programs

4 code implementations CVPR 2018 Xavier Puig, Kevin Ra, Marko Boben, Jiaman Li, Tingwu Wang, Sanja Fidler, Antonio Torralba

We then implement the most common atomic (inter)actions in the Unity3D game engine, and use our programs to "drive" an artificial agent to execute tasks in a simulated household environment.

Video Understanding

Color Sails: Discrete-Continuous Palettes for Deep Color Exploration

no code implementations7 Jun 2018 Maria Shugrina, Amlan Kar, Karan Singh, Sanja Fidler

Then, the user can adjust color sail parameters to change the base colors, their blending behavior and the number of colors, exploring a wide range of options for the original design.

Visual Reasoning by Progressive Module Networks

1 code implementation ICLR 2019 Seung Wook Kim, Makarand Tapaswi, Sanja Fidler

Thus, a module for a new task learns to query existing modules and composes their outputs in order to produce its own output.

Visual Reasoning

Now You Shake Me: Towards Automatic 4D Cinema

no code implementations CVPR 2018 Yuhao Zhou, Makarand Tapaswi, Sanja Fidler

We are interested in enabling automatic 4D cinema by parsing physical and special effects from untrimmed movies.

MovieGraphs: Towards Understanding Human-Centric Situations from Videos

no code implementations CVPR 2018 Paul Vicol, Makarand Tapaswi, Lluis Castrejon, Sanja Fidler

Towards this goal, we introduce a novel dataset called MovieGraphs which provides detailed, graph-based annotations of social situations depicted in movie clips.

Common Sense Reasoning

Be Your Own Prada: Fashion Synthesis with Structural Coherence

no code implementations ICCV 2017 Shizhan Zhu, Sanja Fidler, Raquel Urtasun, Dahua Lin, Chen Change Loy

In the second stage, a generative model with a newly proposed compositional mapping layer is used to render the final image with precise regions and textures conditioned on this map.

Fashion Synthesis Semantic Segmentation +1

SGN: Sequential Grouping Networks for Instance Segmentation

no code implementations ICCV 2017 Shu Liu, Jiaya Jia, Sanja Fidler, Raquel Urtasun

By exploiting two-directional information, the second network groups horizontal and vertical lines into connected components.

Instance Segmentation Object +1

3D Graph Neural Networks for RGBD Semantic Segmentation

2 code implementations ICCV 2017 Xiaojuan Qi, Renjie Liao, Jiaya Jia, Sanja Fidler, Raquel Urtasun

Each node in the graph corresponds to a set of points and is associated with a hidden representation vector initialized with an appearance feature extracted by a unary CNN from 2D images.

Ranked #33 on Semantic Segmentation on SUN-RGBD (using extra training data)

Graph Neural Network RGBD Semantic Segmentation +1

Sports Field Localization via Deep Structured Models

no code implementations CVPR 2017 Namdar Homayounfar, Sanja Fidler, Raquel Urtasun

In this work, we propose a novel way of efficiently localizing a sports field from a single broadcast image of the game.

Semantic Segmentation

Scene Parsing Through ADE20K Dataset

no code implementations CVPR 2017 Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, Antonio Torralba

A novel network design called Cascade Segmentation Module is proposed to parse a scene into stuff, objects, and object parts in a cascade and improve over the baselines.

Object Scene Parsing +1

Annotating Object Instances with a Polygon-RNN

2 code implementations CVPR 2017 Lluis Castrejon, Kaustav Kundu, Raquel Urtasun, Sanja Fidler

We show that our approach speeds up the annotation process by a factor of 4. 7 across all classes in Cityscapes, while achieving 78. 4% agreement in IoU with original ground-truth, matching the typical agreement between human annotators.

Object Segmentation +1

Open Vocabulary Scene Parsing

no code implementations ICCV 2017 Hang Zhao, Xavier Puig, Bolei Zhou, Sanja Fidler, Antonio Torralba

Recognizing arbitrary objects in the wild has been a challenging problem due to the limitations of existing classification models and datasets.

General Classification Scene Parsing

Towards Diverse and Natural Image Descriptions via a Conditional GAN

1 code implementation ICCV 2017 Bo Dai, Sanja Fidler, Raquel Urtasun, Dahua Lin

Despite the substantial progress in recent years, the image captioning techniques are still far from being perfect. Sentences produced by existing methods, e. g. those based on RNNs, are often overly rigid and lacking in variability.

Image Captioning

TorontoCity: Seeing the World with a Million Eyes

no code implementations ICCV 2017 Shenlong Wang, Min Bai, Gellert Mattyus, Hang Chu, Wenjie Luo, Bin Yang, Justin Liang, Joel Cheverie, Sanja Fidler, Raquel Urtasun

In this paper we introduce the TorontoCity benchmark, which covers the full greater Toronto area (GTA) with 712. 5 $km^2$ of land, 8439 $km$ of road and around 400, 000 buildings.

Instance Segmentation Semantic Segmentation

Proximal Deep Structured Models

no code implementations NeurIPS 2016 Shenlong Wang, Sanja Fidler, Raquel Urtasun

Many problems in real-world applications involve predicting continuous-valued random variables that are statistically related.

Image Denoising Optical Flow Estimation

Efficient Summarization with Read-Again and Copy Mechanism

no code implementations10 Nov 2016 Wenyuan Zeng, Wenjie Luo, Sanja Fidler, Raquel Urtasun

Towards this goal, we first introduce a simple mechanism that first reads the input sequence before committing to a representation of each word.

Decoder

Semantic Understanding of Scenes through the ADE20K Dataset

21 code implementations18 Aug 2016 Bolei Zhou, Hang Zhao, Xavier Puig, Tete Xiao, Sanja Fidler, Adela Barriuso, Antonio Torralba

Scene parsing, or recognizing and segmenting objects and stuff in an image, is one of the key problems in computer vision.

Scene Parsing Segmentation +1

Find your Way by Observing the Sun and Other Semantic Cues

no code implementations23 Jun 2016 Wei-Chiu Ma, Shenlong Wang, Marcus A. Brubaker, Sanja Fidler, Raquel Urtasun

In this paper we present a robust, efficient and affordable approach to self-localization which does not require neither GPS nor knowledge about the appearance of the world.

HD Maps: Fine-Grained Road Segmentation by Parsing Ground and Aerial Images

no code implementations CVPR 2016 Gellert Mattyus, Shenlong Wang, Sanja Fidler, Raquel Urtasun

In this paper we present an approach to enhance existing maps with fine grained segmentation categories such as parking spots and sidewalk, as well as the number and location of road lanes.

Road Segmentation

Soccer Field Localization from a Single Image

no code implementations10 Apr 2016 Namdar Homayounfar, Sanja Fidler, Raquel Urtasun

In this work, we propose a novel way of efficiently localizing a soccer field from a single broadcast image of the game.

Instance-Level Segmentation for Autonomous Driving with Deep Densely Connected MRFs

no code implementations CVPR 2016 Ziyu Zhang, Sanja Fidler, Raquel Urtasun

Our aim is to provide a pixel-wise instance-level labeling of a monocular image in the context of autonomous driving.

Autonomous Driving