Search Results for author: Antonio Torralba

Found 192 papers, 88 papers with code

Understanding and Estimating the Adaptability of Domain-Invariant Representations

no code implementations ICML 2020 Ching-Yao Chuang, Antonio Torralba, Stefanie Jegelka

We also propose a method for estimating how well a model based on domain-invariant representations will perform on the target domain, without having seen any target labels.

Model Selection Unsupervised Domain Adaptation

A Multimodal Automated Interpretability Agent

no code implementations22 Apr 2024 Tamar Rott Shaham, Sarah Schwettmann, Franklin Wang, Achyuta Rajaram, Evan Hernandez, Jacob Andreas, Antonio Torralba

Interpretability experiments proposed by MAIA compose these tools to describe and explain system behavior.

Language Modelling

Automatic Discovery of Visual Circuits

1 code implementation22 Apr 2024 Achyuta Rajaram, Neil Chowdhury, Antonio Torralba, Jacob Andreas, Sarah Schwettmann

To date, most discoveries of network subcomponents that implement human-interpretable computations in deep vision models have involved close study of single units and large amounts of human labor.

Efficient 3D Instance Mapping and Localization with Neural Fields

no code implementations28 Mar 2024 George Tang, Krishna Murthy Jatavallabhula, Antonio Torralba

The first phase, InstanceMap, takes as input 2D segmentation masks of the image sequence generated by a frontend instance segmentation model, and associates corresponding masks across images to 3D labels.

3D Instance Segmentation Image Segmentation +3

LATTE3D: Large-scale Amortized Text-To-Enhanced3D Synthesis

no code implementations22 Mar 2024 Kevin Xie, Jonathan Lorraine, Tianshi Cao, Jun Gao, James Lucas, Antonio Torralba, Sanja Fidler, Xiaohui Zeng

Recent text-to-3D generation approaches produce impressive 3D results but require time-consuming optimization that can take up to an hour per prompt.

3D Generation Text to 3D

GOMA: Proactive Embodied Cooperative Communication via Goal-Oriented Mental Alignment

no code implementations17 Mar 2024 Lance Ying, Kunal Jha, Shivam Aarya, Joshua B. Tenenbaum, Antonio Torralba, Tianmin Shu

GOMA formulates verbal communication as a planning problem that minimizes the misalignment between the parts of agents' mental states that are relevant to the goals.

A Vision Check-up for Language Models

no code implementations3 Jan 2024 Pratyusha Sharma, Tamar Rott Shaham, Manel Baradad, Stephanie Fu, Adrian Rodriguez-Munoz, Shivam Duggal, Phillip Isola, Antonio Torralba

Although LLM-generated images do not look like natural images, results on image generation and the ability of models to correct these generated images indicate that precise modeling of strings can teach language models about numerous aspects of the visual world.

Image Generation Representation Learning

Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models

no code implementations21 Dec 2023 Huan Ling, Seung Wook Kim, Antonio Torralba, Sanja Fidler, Karsten Kreis

We also propose a motion amplification mechanism as well as a new autoregressive synthesis scheme to generate and combine multiple 4D sequences for longer generation.

Synthetic Data Generation Video Generation

Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models

1 code implementation20 Nov 2023 Rohit Gandikota, Joanna Materzynska, Tingrui Zhou, Antonio Torralba, David Bau

We present a method to create interpretable concept sliders that enable precise control over attributes in image generations from diffusion models.

Image Generation

ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning

no code implementations28 Sep 2023 Qiao Gu, Alihusein Kuwajerwala, Sacha Morin, Krishna Murthy Jatavallabhula, Bipasha Sen, Aditya Agarwal, Corban Rivera, William Paul, Kirsty Ellis, Rama Chellappa, Chuang Gan, Celso Miguel de Melo, Joshua B. Tenenbaum, Antonio Torralba, Florian Shkurti, Liam Paull

We demonstrate the utility of this representation through a number of downstream planning tasks that are specified through abstract (language) prompts and require complex reasoning over spatial and semantic concepts.

FIND: A Function Description Benchmark for Evaluating Interpretability Methods

1 code implementation NeurIPS 2023 Sarah Schwettmann, Tamar Rott Shaham, Joanna Materzynska, Neil Chowdhury, Shuang Li, Jacob Andreas, David Bau, Antonio Torralba

FIND contains functions that resemble components of trained neural networks, and accompanying descriptions of the kind we seek to generate.

Follow Anything: Open-set detection, tracking, and following in real-time

1 code implementation10 Aug 2023 Alaa Maalouf, Ninad Jadhav, Krishna Murthy Jatavallabhula, Makram Chahine, Daniel M. Vogt, Robert J. Wood, Antonio Torralba, Daniela Rus

We demonstrate FAn on a real-world robotic system (a micro aerial vehicle) and report its ability to seamlessly follow the objects of interest in a real-time control loop.

Multimodal Neurons in Pretrained Text-Only Transformers

no code implementations3 Aug 2023 Sarah Schwettmann, Neil Chowdhury, Samuel Klein, David Bau, Antonio Torralba

Language models demonstrate remarkable capacity to generalize representations learned in one modality to downstream tasks in other modalities.

Image Captioning

DreamTeacher: Pretraining Image Backbones with Deep Generative Models

no code implementations ICCV 2023 Daiqing Li, Huan Ling, Amlan Kar, David Acuna, Seung Wook Kim, Karsten Kreis, Antonio Torralba, Sanja Fidler

In this work, we introduce a self-supervised feature representation learning framework DreamTeacher that utilizes generative networks for pre-training downstream image backbones.

Knowledge Distillation Representation Learning

Background Prompting for Improved Object Depth

no code implementations8 Jun 2023 Manel Baradad, Yuanzhen Li, Forrester Cole, Michael Rubinstein, Antonio Torralba, William T. Freeman, Varun Jampani

To infer object depth on a real image, we place the segmented object into the learned background prompt and run off-the-shelf depth networks.


Unsupervised Compositional Concepts Discovery with Text-to-Image Generative Models

no code implementations ICCV 2023 Nan Liu, Yilun Du, Shuang Li, Joshua B. Tenenbaum, Antonio Torralba

Text-to-image generative models have enabled high-resolution image synthesis across different domains, but require users to specify the content they wish to generate.

Image Generation

Improving Factuality and Reasoning in Language Models through Multiagent Debate

1 code implementation23 May 2023 Yilun Du, Shuang Li, Antonio Torralba, Joshua B. Tenenbaum, Igor Mordatch

Our findings indicate that this approach significantly enhances mathematical and strategic reasoning across a number of tasks.

Few-Shot Learning Language Modelling +1

NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models

no code implementations CVPR 2023 Seung Wook Kim, Bradley Brown, Kangxue Yin, Karsten Kreis, Katja Schwarz, Daiqing Li, Robin Rombach, Antonio Torralba, Sanja Fidler

We first train a scene auto-encoder to express a set of image and pose pairs as a neural field, represented as density and feature voxel grids that can be projected to produce novel views of the scene.

Scene Generation

Optimal Goal-Reaching Reinforcement Learning via Quasimetric Learning

1 code implementation3 Apr 2023 Tongzhou Wang, Antonio Torralba, Phillip Isola, Amy Zhang

In goal-reaching reinforcement learning (RL), the optimal value function has a particular geometry, called quasimetric structure.

reinforcement-learning Reinforcement Learning (RL)

Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos

no code implementations CVPR 2023 Kun Su, Kaizhi Qian, Eli Shlizerman, Antonio Torralba, Chuang Gan

Traditional methods of impact sound synthesis use physics simulation to obtain a set of physics parameters that could represent and synthesize the sound.

FluidLab: A Differentiable Environment for Benchmarking Complex Fluid Manipulation

1 code implementation4 Mar 2023 Zhou Xian, Bo Zhu, Zhenjia Xu, Hsiao-Yu Tung, Antonio Torralba, Katerina Fragkiadaki, Chuang Gan

We identify several challenges for fluid manipulation learning by evaluating a set of reinforcement learning and trajectory optimization methods on our platform.


ConceptFusion: Open-set Multimodal 3D Mapping

1 code implementation14 Feb 2023 Krishna Murthy Jatavallabhula, Alihusein Kuwajerwala, Qiao Gu, Mohd Omama, Tao Chen, Alaa Maalouf, Shuang Li, Ganesh Iyer, Soroush Saryazdi, Nikhil Keetha, Ayush Tewari, Joshua B. Tenenbaum, Celso Miguel de Melo, Madhava Krishna, Liam Paull, Florian Shkurti, Antonio Torralba

ConceptFusion leverages the open-set capabilities of today's foundation models pre-trained on internet-scale data to reason about concepts across modalities such as natural language, images, and audio.

Autonomous Driving Robot Navigation

Debiasing Vision-Language Models via Biased Prompts

1 code implementation31 Jan 2023 Ching-Yao Chuang, Varun Jampani, Yuanzhen Li, Antonio Torralba, Stefanie Jegelka

Machine learning models have been shown to inherit biases from their training datasets.

NOPA: Neurally-guided Online Probabilistic Assistance for Building Socially Intelligent Home Assistants

no code implementations12 Jan 2023 Xavier Puig, Tianmin Shu, Joshua B. Tenenbaum, Antonio Torralba

Experiments show that our helper agent robustly updates its goal inference and adapts its helping plans to the changing level of uncertainty.

BT^2: Backward-compatible Training with Basis Transformation

1 code implementation ICCV 2023 Yifei Zhou, Zilu Li, Abhinav Shrivastava, Hengshuang Zhao, Antonio Torralba, Taipeng Tian, Ser-Nam Lim

In this way, the new representation can be directly compared with the old representation, in principle avoiding the need for any backfilling.

Aliasing is a Driver of Adversarial Attacks

no code implementations22 Dec 2022 Adrián Rodríguez-Muñoz, Antonio Torralba

In this work, we investigate the hypothesis that the existence of adversarial perturbations is due in part to aliasing in neural networks.

Procedural Image Programs for Representation Learning

1 code implementation29 Nov 2022 Manel Baradad, Chun-Fu Chen, Jonas Wulff, Tongzhou Wang, Rogerio Feris, Antonio Torralba, Phillip Isola

Learning image representations using synthetic data allows training neural networks without some of the concerns associated with real images, such as privacy and bias.

Representation Learning

$BT^2$: Backward-compatible Training with Basis Transformation

1 code implementation8 Nov 2022 Yifei Zhou, Zilu Li, Abhinav Shrivastava, Hengshuang Zhao, Antonio Torralba, Taipeng Tian, Ser-Nam Lim

In this way, the new representation can be directly compared with the old representation, in principle avoiding the need for any backfilling.


Composing Ensembles of Pre-trained Models via Iterative Consensus

no code implementations20 Oct 2022 Shuang Li, Yilun Du, Joshua B. Tenenbaum, Antonio Torralba, Igor Mordatch

Such closed-loop communication enables models to correct errors caused by other models, significantly boosting performance on downstream tasks, e. g. improving accuracy on grade school math problems by 7. 5%, without requiring any model finetuning.

Arithmetic Reasoning Image Generation +4

Totems: Physical Objects for Verifying Visual Integrity

no code implementations26 Sep 2022 Jingwei Ma, Lucy Chai, Minyoung Huh, Tongzhou Wang, Ser-Nam Lim, Phillip Isola, Antonio Torralba

We introduce a new approach to image forensics: placing physical refractive objects, which we call totems, into a scene so as to protect any photograph taken of that scene.

Image Forensics

Local Relighting of Real Scenes

2 code implementations6 Jul 2022 Audrey Cui, Ali Jahanian, Agata Lapedriza, Antonio Torralba, Shahin Mahdizadehaghdam, Rohit Kumar, David Bau

We introduce the task of local relighting, which changes a photograph of a scene by switching on and off the light sources that are visible within the image.

Image Relighting

Virtual Correspondence: Humans as a Cue for Extreme-View Geometry

no code implementations CVPR 2022 Wei-Chiu Ma, Anqi Joyce Yang, Shenlong Wang, Raquel Urtasun, Antonio Torralba

Similar to classic correspondences, VCs conform with epipolar geometry; unlike classic correspondences, VCs do not need to be co-visible across views.

3D Reconstruction Novel View Synthesis +1

Disentangling visual and written concepts in CLIP

no code implementations CVPR 2022 Joanna Materzynska, Antonio Torralba, David Bau

The CLIP network measures the similarity between natural text and images; in this work, we investigate the entanglement of the representation of word images and natural images in its image encoder.


Compositional Visual Generation with Composable Diffusion Models

1 code implementation3 Jun 2022 Nan Liu, Shuang Li, Yilun Du, Antonio Torralba, Joshua B. Tenenbaum

Large text-guided diffusion models, such as DALLE-2, are able to generate stunning photorealistic images given natural language descriptions.


Fixing Malfunctional Objects With Learned Physical Simulation and Functional Prediction

no code implementations CVPR 2022 Yining Hong, Kaichun Mo, Li Yi, Leonidas J. Guibas, Antonio Torralba, Joshua B. Tenenbaum, Chuang Gan

Specifically, FixNet consists of a perception module to extract the structured representation from the 3D point cloud, a physical dynamics prediction module to simulate the results of interactions on 3D objects, and a functionality prediction module to evaluate the functionality and choose the correct fix.

ComPhy: Compositional Physical Reasoning of Objects and Events from Videos

no code implementations ICLR 2022 Zhenfang Chen, Kexin Yi, Yunzhu Li, Mingyu Ding, Antonio Torralba, Joshua B. Tenenbaum, Chuang Gan

In this paper, we take an initial step to highlight the importance of inferring the hidden physical properties not directly observable from visual appearances, by introducing the Compositional Physical Reasoning (ComPhy) dataset.

Learning Neural Acoustic Fields

1 code implementation4 Apr 2022 Andrew Luo, Yilun Du, Michael J. Tarr, Joshua B. Tenenbaum, Antonio Torralba, Chuang Gan

By modeling acoustic propagation in a scene as a linear time-invariant system, NAFs learn to continuously map all emitter and listener location pairs to a neural impulse response function that can then be applied to arbitrary sounds.

Learning Program Representations for Food Images and Cooking Recipes

no code implementations CVPR 2022 Dim P. Papadopoulos, Enrique Mora, Nadiia Chepurko, Kuan Wei Huang, Ferda Ofli, Antonio Torralba

To validate our idea, we crowdsource programs for cooking recipes and show that: (a) projecting the image-recipe embeddings into programs leads to better cross-modal retrieval results; (b) generating programs from images leads to better recognition results compared to predicting raw cooking instructions; and (c) we can generate food images by manipulating programs via optimizing the latent code of a GAN.

Cross-Modal Retrieval Retrieval

Dataset Distillation by Matching Training Trajectories

5 code implementations CVPR 2022 George Cazenavette, Tongzhou Wang, Antonio Torralba, Alexei A. Efros, Jun-Yan Zhu

To efficiently obtain the initial and target network parameters for large-scale datasets, we pre-compute and store training trajectories of expert networks trained on the real dataset.

Pre-Trained Language Models for Interactive Decision-Making

1 code implementation3 Feb 2022 Shuang Li, Xavier Puig, Chris Paxton, Yilun Du, Clinton Wang, Linxi Fan, Tao Chen, De-An Huang, Ekin Akyürek, Anima Anandkumar, Jacob Andreas, Igor Mordatch, Antonio Torralba, Yuke Zhu

Together, these results suggest that language modeling induces representations that are useful for modeling not just language, but also goals and plans; these representations can aid learning and generalization even outside of language processing.

Imitation Learning Language Modelling

Natural Language Descriptions of Deep Visual Features

2 code implementations26 Jan 2022 Evan Hernandez, Sarah Schwettmann, David Bau, Teona Bagashvili, Antonio Torralba, Jacob Andreas

Given a neuron, MILAN generates a description by searching for a natural language string that maximizes pointwise mutual information with the image regions in which the neuron is active.


Robust Contrastive Learning against Noisy Views

1 code implementation CVPR 2022 Ching-Yao Chuang, R Devon Hjelm, Xin Wang, Vibhav Vineet, Neel Joshi, Antonio Torralba, Stefanie Jegelka, Yale Song

Contrastive learning relies on an assumption that positive pairs contain related views, e. g., patches of an image or co-occurring multimodal signals of a video, that share certain underlying information about an instance.

Binary Classification Contrastive Learning

Incidents1M: a large-scale dataset of images with natural disasters, damage, and incidents

1 code implementation11 Jan 2022 Ethan Weber, Dim P. Papadopoulos, Agata Lapedriza, Ferda Ofli, Muhammad Imran, Antonio Torralba

In this work, we present the Incidents1M Dataset, a large-scale multi-label dataset which contains 977, 088 images, with 43 incident and 49 place categories.


GAN-Supervised Dense Visual Alignment

1 code implementation CVPR 2022 William Peebles, Jun-Yan Zhu, Richard Zhang, Antonio Torralba, Alexei A. Efros, Eli Shechtman

We propose GAN-Supervised Learning, a framework for learning discriminative models and their GAN-generated training data jointly end-to-end.

Data Augmentation Dense Pixel Correspondence Estimation

PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning

no code implementations NeurIPS 2021 Yining Hong, Li Yi, Joshua B. Tenenbaum, Antonio Torralba, Chuang Gan

A critical aspect of human visual perception is the ability to parse visual scenes into individual objects and further into object parts, forming part-whole hierarchies.

Instance Segmentation Object +2

Editing a classifier by rewriting its prediction rules

1 code implementation NeurIPS 2021 Shibani Santurkar, Dimitris Tsipras, Mahalaxmi Elango, David Bau, Antonio Torralba, Aleksander Madry

We present a methodology for modifying the behavior of a classifier by directly rewriting its prediction rules.

Learning to Compose Visual Relations

no code implementations NeurIPS 2021 Nan Liu, Shuang Li, Yilun Du, Joshua B. Tenenbaum, Antonio Torralba

The visual world around us can be described as a structured set of objects and their associated relations.

EditGAN: High-Precision Semantic Image Editing

1 code implementation NeurIPS 2021 Huan Ling, Karsten Kreis, Daiqing Li, Seung Wook Kim, Antonio Torralba, Sanja Fidler

EditGAN builds on a GAN framework that jointly models images and their semantic segmentations, requiring only a handful of labeled examples, making it a scalable tool for editing.

Segmentation Semantic Segmentation +1

OPEn: An Open-ended Physics Environment for Learning Without a Task

1 code implementation13 Oct 2021 Chuang Gan, Abhishek Bhandwaldar, Antonio Torralba, Joshua B. Tenenbaum, Phillip Isola

We test several existing RL-based exploration methods on this benchmark and find that an agent using unsupervised contrastive learning for representation learning, and impact-driven learning for exploration, achieved the best results.

Contrastive Learning Representation Learning

Ego4D: Around the World in 3,000 Hours of Egocentric Video

7 code implementations CVPR 2022 Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei HUANG, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Christian Fuegen, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite.

De-identification Ethics

Toward a Visual Concept Vocabulary for GAN Latent Space

1 code implementation ICCV 2021 Sarah Schwettmann, Evan Hernandez, David Bau, Samuel Klein, Jacob Andreas, Antonio Torralba

A large body of recent work has identified transformations in the latent spaces of generative adversarial networks (GANs) that consistently and interpretably transform generated images.


Weakly Supervised Human-Object Interaction Detection in Video via Contrastive Spatiotemporal Regions

1 code implementation ICCV 2021 Shuang Li, Yilun Du, Antonio Torralba, Josef Sivic, Bryan Russell

Our task poses unique challenges as a system does not know what types of human-object interactions are present in a video or the actual spatiotemporal location of the human and the object.

Human-Object Interaction Detection Object +2

Scaling up instance annotation via label propagation

no code implementations ICCV 2021 Dim P. Papadopoulos, Ethan Weber, Antonio Torralba

Through a large-scale experiment to populate 1M unlabeled images with object segmentation masks for 80 object classes, we show that (1) we obtain 1M object segmentation masks with an total annotation time of only 290 hours; (2) we reduce annotation time by 76x compared to manual annotation; (3) the segmentation quality of our masks is on par with those from manually annotated datasets.

Interactive Segmentation Object +2

Skill Induction and Planning with Latent Language

no code implementations ACL 2022 Pratyusha Sharma, Antonio Torralba, Jacob Andreas

We evaluate this approach in the ALFRED household simulation environment, providing natural language annotations for only 10% of demonstrations.

Decision Making

Language Model Pre-training Improves Generalization in Policy Learning

no code implementations29 Sep 2021 Shuang Li, Xavier Puig, Yilun Du, Ekin Akyürek, Antonio Torralba, Jacob Andreas, Igor Mordatch

Additional experiments explore the role of language-based encodings in these results; we find that it is possible to train a simple adapter layer that maps from observations and action histories to LM embeddings, and thus that language modeling provides an effective initializer even for tasks with no language as input or output.

Imitation Learning Language Modelling

Natural Language Descriptions of Deep Features

no code implementations ICLR 2022 Evan Hernandez, Sarah Schwettmann, David Bau, Teona Bagashvili, Antonio Torralba, Jacob Andreas

Given a neuron, MILAN generates a description by searching for a natural language string that maximizes pointwise mutual information with the image regions in which the neuron is active.


Dynamic Modeling of Hand-Object Interactions via Tactile Sensing

no code implementations9 Sep 2021 Qiang Zhang, Yunzhu Li, Yiyue Luo, Wan Shou, Michael Foshey, Junchi Yan, Joshua B. Tenenbaum, Wojciech Matusik, Antonio Torralba

This work takes a step on dynamics modeling in hand-object interactions from dense tactile sensing, which opens the door for future applications in activity learning, human-computer interactions, and imitation learning for robotics.

Contrastive Learning Imitation Learning +1

What You Can Learn by Staring at a Blank Wall

no code implementations ICCV 2021 Prafull Sharma, Miika Aittala, Yoav Y. Schechner, Antonio Torralba, Gregory W. Wornell, William T. Freeman, Fredo Durand

We present a passive non-line-of-sight method that infers the number of people or activity of a person from the observation of a blank wall in an unknown room.

Intelligent Carpet: Inferring 3D Human Pose From Tactile Signals

no code implementations CVPR 2021 Yiyue Luo, Yunzhu Li, Michael Foshey, Wan Shou, Pratyusha Sharma, Tomas Palacios, Antonio Torralba, Wojciech Matusik

In this work, leveraging such tactile interactions, we propose a 3D human pose estimation approach using the pressure maps recorded by a tactile carpet as input.

3D Human Pose Estimation Multi-Person Pose Estimation

Learning to See by Looking at Noise

1 code implementation NeurIPS 2021 Manel Baradad, Jonas Wulff, Tongzhou Wang, Phillip Isola, Antonio Torralba

We investigate a suite of image generation models that produce images from simple random processes.

Image Generation

Cetacean Translation Initiative: a roadmap to deciphering the communication of sperm whales

no code implementations17 Apr 2021 Jacob Andreas, Gašper Beguš, Michael M. Bronstein, Roee Diamant, Denley Delaney, Shane Gero, Shafi Goldwasser, David F. Gruber, Sarah de Haas, Peter Malkin, Roger Payne, Giovanni Petri, Daniela Rus, Pratyusha Sharma, Dan Tchernov, Pernille Tønnesen, Antonio Torralba, Daniel Vogt, Robert J. Wood

We posit that machine learning will be the cornerstone of future collection, processing, and analysis of multimodal streams of data in animal communication studies, including bioacoustic, behavioral, biological, and environmental data.

BIG-bench Machine Learning Sentence +1

DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort

2 code implementations CVPR 2021 Yuxuan Zhang, Huan Ling, Jun Gao, Kangxue Yin, Jean-Francois Lafleche, Adela Barriuso, Antonio Torralba, Sanja Fidler

To showcase the power of our approach, we generated datasets for 7 image segmentation tasks which include pixel-level labels for 34 human face parts, and 32 car parts.

Decoder Image Segmentation +1

BARF: Bundle-Adjusting Neural Radiance Fields

4 code implementations ICCV 2021 Chen-Hsuan Lin, Wei-Chiu Ma, Antonio Torralba, Simon Lucey

In this paper, we propose Bundle-Adjusting Neural Radiance Fields (BARF) for training NeRF from imperfect (or even unknown) camera poses -- the joint problem of learning neural 3D representations and registering camera frames.

Visual Localization

Deep Feedback Inverse Problem Solver

no code implementations ECCV 2020 Wei-Chiu Ma, Shenlong Wang, Jiayuan Gu, Sivabalan Manivasagam, Antonio Torralba, Raquel Urtasun

Specifically, at each iteration, the neural network takes the feedback as input and outputs an update on the current estimation.

Pose Estimation

Image GANs meet Differentiable Rendering for Inverse Graphics and Interpretable 3D Neural Rendering

no code implementations ICLR 2021 Yuxuan Zhang, Wenzheng Chen, Huan Ling, Jun Gao, Yinan Zhang, Antonio Torralba, Sanja Fidler

Key to our approach is to exploit GANs as a multi-view data generator to train an inverse graphics network using an off-the-shelf differentiable renderer, and the trained inverse graphics network as a teacher to disentangle the GAN's latent code into interpretable 3D properties.

Neural Rendering

Improving Inversion and Generation Diversity in StyleGAN using a Gaussianized Latent Space

no code implementations14 Sep 2020 Jonas Wulff, Antonio Torralba

We show that, under a simple nonlinear operation, the data distribution can be modeled as Gaussian and therefore expressed using sufficient statistics.

The Hessian Penalty: A Weak Prior for Unsupervised Disentanglement

1 code implementation ECCV 2020 William Peebles, John Peebles, Jun-Yan Zhu, Alexei Efros, Antonio Torralba

In this paper, we propose the Hessian Penalty, a simple regularization term that encourages the Hessian of a generative model with respect to its input to be diagonal.


Detecting natural disasters, damage, and incidents in the wild

1 code implementation ECCV 2020 Ethan Weber, Nuria Marzo, Dim P. Papadopoulos, Aritro Biswas, Agata Lapedriza, Ferda Ofli, Muhammad Imran, Antonio Torralba

While most studies on social media are limited to text, images offer more information for understanding disaster and incident scenes.

Rewriting a Deep Generative Model

3 code implementations ECCV 2020 David Bau, Steven Liu, Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba

To address the problem, we propose a formulation in which the desired rule is changed by manipulating a layer of a deep network as a linear associative memory.

Noisy Agents: Self-supervised Exploration by Predicting Auditory Events

no code implementations27 Jul 2020 Chuang Gan, Xiaoyu Chen, Phillip Isola, Antonio Torralba, Joshua B. Tenenbaum

Humans integrate multiple sensory modalities (e. g. visual and audio) to build a causal understanding of the physical world.

Atari Games Reinforcement Learning (RL)

Foley Music: Learning to Generate Music from Videos

no code implementations ECCV 2020 Chuang Gan, Deng Huang, Peihao Chen, Joshua B. Tenenbaum, Antonio Torralba

In this paper, we introduce Foley Music, a system that can synthesize plausible music for a silent video clip about people playing musical instruments.

Music Generation Translation

Estimating Generalization under Distribution Shifts via Domain-Invariant Representations

1 code implementation6 Jul 2020 Ching-Yao Chuang, Antonio Torralba, Stefanie Jegelka

When machine learning models are deployed on a test distribution different from the training distribution, they can perform poorly, but overestimate their performance.

Domain Adaptation Model Selection

Causal Discovery in Physical Systems from Videos

1 code implementation NeurIPS 2020 Yunzhu Li, Antonio Torralba, Animashree Anandkumar, Dieter Fox, Animesh Garg

We assume access to different configurations and environmental conditions, i. e., data from unknown interventions on the underlying system; thus, we can hope to discover the correct underlying causal graph without explicit interventions.

Causal Discovery counterfactual

Debiased Contrastive Learning

1 code implementation NeurIPS 2020 Ching-Yao Chuang, Joshua Robinson, Lin Yen-Chen, Antonio Torralba, Stefanie Jegelka

A prominent technique for self-supervised representation learning has been to contrast semantically similar and dissimilar pairs of samples.

Contrastive Learning Generalization Bounds +2

Deep Audio Priors Emerge From Harmonic Convolutional Networks

no code implementations ICLR 2020 Zhoutong Zhang, Yunyun Wang, Chuang Gan, Jiajun Wu, Joshua B. Tenenbaum, Antonio Torralba, William T. Freeman

We show that networks using Harmonic Convolution can reliably model audio priors and achieve high performance in unsupervised audio restoration tasks.

Visual Grounding of Learned Physical Models

1 code implementation ICML 2020 Yunzhu Li, Toru Lin, Kexin Yi, Daniel M. Bear, Daniel L. K. Yamins, Jiajun Wu, Joshua B. Tenenbaum, Antonio Torralba

The abilities to perform physical reasoning and to adapt to new environments, while intrinsic to humans, remain challenging to state-of-the-art computational models.

Visual Grounding

Self-supervised Moving Vehicle Tracking with Stereo Sound

no code implementations ICCV 2019 Chuang Gan, Hang Zhao, Peihao Chen, David Cox, Antonio Torralba

At test time, the stereo-sound student network can work independently to perform object localization us-ing just stereo audio and camera meta-data, without any visual input.

Object Localization Visual Localization

Learning Compositional Koopman Operators for Model-Based Control

no code implementations ICLR 2020 Yunzhu Li, Hao He, Jiajun Wu, Dina Katabi, Antonio Torralba

Finding an embedding space for a linear approximation of a nonlinear dynamical system enables efficient system identification and control synthesis.

The Role of Embedding Complexity in Domain-invariant Representations

1 code implementation13 Oct 2019 Ching-Yao Chuang, Antonio Torralba, Stefanie Jegelka

In this work, we study, theoretically and empirically, the effect of the embedding complexity on generalization to the target domain.

Unsupervised Domain Adaptation

Neural Turtle Graphics for Modeling City Road Layouts

no code implementations ICCV 2019 Hang Chu, Daiqing Li, David Acuna, Amlan Kar, Maria Shugrina, Xinkai Wei, Ming-Yu Liu, Antonio Torralba, Sanja Fidler

We propose Neural Turtle Graphics (NTG), a novel generative model for spatial graphs, and demonstrate its applications in modeling city road layouts.

CLEVRER: CoLlision Events for Video REpresentation and Reasoning

3 code implementations ICLR 2020 Kexin Yi, Chuang Gan, Yunzhu Li, Pushmeet Kohli, Jiajun Wu, Antonio Torralba, Joshua B. Tenenbaum

While these models thrive on the perception-based task (descriptive), they perform poorly on the causal tasks (explanatory, predictive and counterfactual), suggesting that a principled approach for causal reasoning should incorporate the capability of both perceiving complex visual and language inputs, and understanding the underlying dynamics and causal relations.

counterfactual Descriptive +1

Connecting Touch and Vision via Cross-Modal Prediction

1 code implementation CVPR 2019 Yunzhu Li, Jun-Yan Zhu, Russ Tedrake, Antonio Torralba

To connect vision and touch, we introduce new tasks of synthesizing plausible tactile signals from visual inputs as well as imagining how we interact with objects given tactile data as input.

How to make a pizza: Learning a compositional layer-based GAN model

no code implementations CVPR 2019 Dim P. Papadopoulos, Youssef Tamaazousti, Ferda Ofli, Ingmar Weber, Antonio Torralba

From a visual perspective, every instruction step can be seen as a way to change the visual appearance of the dish by adding extra objects (e. g., adding an ingredient) or changing the appearance of the existing ones (e. g., cooking the dish).

Generative Adversarial Network

Meta-Sim: Learning to Generate Synthetic Datasets

no code implementations ICCV 2019 Amlan Kar, Aayush Prakash, Ming-Yu Liu, Eric Cameracci, Justin Yuan, Matt Rusiniak, David Acuna, Antonio Torralba, Sanja Fidler

Training models to high-end performance requires availability of large labeled datasets, which are expensive to get.

Self-Supervised Audio-Visual Co-Segmentation

no code implementations18 Apr 2019 Andrew Rouditchenko, Hang Zhao, Chuang Gan, Josh Mcdermott, Antonio Torralba

Segmenting objects in images and separating sound sources in audio are challenging tasks, in part because traditional approaches require large amounts of labeled data.

Image Segmentation Segmentation +1

The Sound of Motions

1 code implementation ICCV 2019 Hang Zhao, Chuang Gan, Wei-Chiu Ma, Antonio Torralba

Sounds originate from object motions and vibrations of surrounding air.

On the Units of GANs (Extended Abstract)

no code implementations29 Jan 2019 David Bau, Jun-Yan Zhu, Hendrik Strobelt, Bolei Zhou, Joshua B. Tenenbaum, William T. Freeman, Antonio Torralba

We quantify the causal effect of interpretable units by measuring the ability of interventions to control objects in the output.

Visual Object Networks: Image Generation with Disentangled 3D Representations

1 code implementation NeurIPS 2018 Jun-Yan Zhu, Zhoutong Zhang, Chengkai Zhang, Jiajun Wu, Antonio Torralba, Josh Tenenbaum, Bill Freeman

The VON not only generates images that are more realistic than the state-of-the-art 2D image synthesis methods but also enables many 3D operations such as changing the viewpoint of a generated image, shape and texture editing, linear interpolation in texture and shape space, and transferring appearance across different objects and viewpoints.

Image Generation Object

Dataset Distillation

5 code implementations27 Nov 2018 Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba, Alexei A. Efros

Model distillation aims to distill the knowledge of a complex model into a simpler one.

Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding

2 code implementations NeurIPS 2018 Kexin Yi, Jiajun Wu, Chuang Gan, Antonio Torralba, Pushmeet Kohli, Joshua B. Tenenbaum

Second, the model is more data- and memory-efficient: it performs well after learning on a small number of training data; it can also encode an image into a compact representation, requiring less storage than existing methods for offline question answering.

Question Answering Representation Learning +1

Propagation Networks for Model-Based Control Under Partial Observation

1 code implementation28 Sep 2018 Yunzhu Li, Jiajun Wu, Jun-Yan Zhu, Joshua B. Tenenbaum, Antonio Torralba, Russ Tedrake

There has been an increasing interest in learning dynamics simulators for model-based control.

Learning to Zoom: a Saliency-Based Sampling Layer for Neural Networks

1 code implementation ECCV 2018 Adrià Recasens, Petr Kellnhofer, Simon Stent, Wojciech Matusik, Antonio Torralba

We introduce a saliency-based distortion layer for convolutional neural networks that helps to improve the spatial sampling of input data for a given task.

Caricature Gaze Estimation +2

Interpretable Basis Decomposition for Visual Explanation

1 code implementation ECCV 2018 Bolei Zhou, Yiyou Sun, David Bau, Antonio Torralba

Explanations of the decisions made by a deep neural network are important for human end-users to be able to understand and diagnose the trustworthiness of the system.

Single Image Intrinsic Decomposition without a Single Intrinsic Image

no code implementations ECCV 2018 Wei-Chiu Ma, Hang Chu, Bolei Zhou, Raquel Urtasun, Antonio Torralba

At inference time, our model can be easily reduced to a single stream module that performs intrinsic decomposition on a single input image.

Intrinsic Image Decomposition

3D-Aware Scene Manipulation via Inverse Graphics

1 code implementation NeurIPS 2018 Shunyu Yao, Tzu Ming Harry Hsu, Jun-Yan Zhu, Jiajun Wu, Antonio Torralba, William T. Freeman, Joshua B. Tenenbaum

In this work, we propose 3D scene de-rendering networks (3D-SDN) to address the above issues by integrating disentangled representations for semantics, geometry, and appearance into a deep generative model.

Decoder Disentanglement +1

VirtualHome: Simulating Household Activities via Programs

4 code implementations CVPR 2018 Xavier Puig, Kevin Ra, Marko Boben, Jiaman Li, Tingwu Wang, Sanja Fidler, Antonio Torralba

We then implement the most common atomic (inter)actions in the Unity3D game engine, and use our programs to "drive" an artificial agent to execute tasks in a simulated household environment.

Video Understanding

Revisiting the Importance of Individual Units in CNNs via Ablation

no code implementations7 Jun 2018 Bolei Zhou, Yiyou Sun, David Bau, Antonio Torralba

We confirm that unit attributes such as class selectivity are a poor predictor for impact on overall accuracy as found previously in recent work \cite{morcos2018importance}.

General Classification

Through-Wall Human Pose Estimation Using Radio Signals

no code implementations CVPR 2018 Ming-Min Zhao, Tianhong Li, Mohammad Abu Alsheikh, Yonglong Tian, Hang Zhao, Antonio Torralba, Dina Katabi

Yet, unlike vision-based pose estimation, the radio-based system can estimate 2D poses through walls despite never trained on such scenarios.

RF-based Pose Estimation

Inferring Light Fields From Shadows

1 code implementation CVPR 2018 Manel Baradad, Vickie Ye, Adam B. Yedidia, Frédo Durand, William T. Freeman, Gregory W. Wornell, Antonio Torralba

We present a method for inferring a 4D light field of a hidden scene from 2D shadows cast by a known occluder on a diffuse wall.

The Sound of Pixels

2 code implementations ECCV 2018 Hang Zhao, Chuang Gan, Andrew Rouditchenko, Carl Vondrick, Josh Mcdermott, Antonio Torralba

We introduce PixelPlayer, a system that, by leveraging large amounts of unlabeled videos, learns to locate image regions which produce sounds and separate the input sounds into a set of components that represents the sound from each pixel.

Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input

no code implementations ECCV 2018 David Harwath, Adrià Recasens, Dídac Surís, Galen Chuang, Antonio Torralba, James Glass

In this paper, we explore neural network models that learn to associate segments of spoken audio captions with the semantically relevant portions of natural images that they refer to.


3D Interpreter Networks for Viewer-Centered Wireframe Modeling

no code implementations3 Apr 2018 Jiajun Wu, Tianfan Xue, Joseph J. Lim, Yuandong Tian, Joshua B. Tenenbaum, Antonio Torralba, William T. Freeman

3D-INN is trained on real images to estimate 2D keypoint heatmaps from an input image; it then predicts 3D object structure from heatmaps using knowledge learned from synthetic 3D shapes.

Image Retrieval Keypoint Estimation +2

Counterfactual Image Networks

no code implementations ICLR 2018 Deniz Oktay, Carl Vondrick, Antonio Torralba

However, when a layer is removed, the model learns to produce a different image that still looks natural to an adversary, which is possible by removing objects.

counterfactual Object +2

Learning Sight from Sound: Ambient Sound Provides Supervision for Visual Learning

no code implementations20 Dec 2017 Andrew Owens, Jiajun Wu, Josh H. McDermott, William T. Freeman, Antonio Torralba

The sound of crashing waves, the roar of fast-moving cars -- sound conveys important information about the objects in our surroundings.

Temporal Relational Reasoning in Videos

5 code implementations ECCV 2018 Bolei Zhou, Alex Andonian, Aude Oliva, Antonio Torralba

Temporal relational reasoning, the ability to link meaningful transformations of objects or entities over time, is a fundamental property of intelligent species.

Action Classification Action Recognition In Videos +4

Interpreting Deep Visual Representations via Network Dissection

2 code implementations15 Nov 2017 Bolei Zhou, David Bau, Aude Oliva, Antonio Torralba

In this work, we describe Network Dissection, a method that interprets networks by providing labels for the units of their deep visual representations.

Following Gaze in Video

no code implementations ICCV 2017 Adria Recasens, Carl Vondrick, Aditya Khosla, Antonio Torralba

In this paper, we present an approach for following gaze in video by predicting where a person (in the video) is looking even when the object is in a different frame.

Turning Corners Into Cameras: Principles and Methods

no code implementations ICCV 2017 Katherine L. Bouman, Vickie Ye, Adam B. Yedidia, Fredo Durand, Gregory W. Wornell, Antonio Torralba, William T. Freeman

We show that walls and other obstructions with edges can be exploited as naturally-occurring "cameras" that reveal the hidden scenes beyond them.

Scene Parsing Through ADE20K Dataset

no code implementations CVPR 2017 Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, Antonio Torralba

A novel network design called Cascade Segmentation Module is proposed to parse a scene into stuff, objects, and object parts in a cascade and improve over the baselines.

Object Scene Parsing +1

See, Hear, and Read: Deep Aligned Representations

1 code implementation3 Jun 2017 Yusuf Aytar, Carl Vondrick, Antonio Torralba

We capitalize on large amounts of readily-available, synchronous data to learn a deep discriminative representations shared across three major natural modalities: vision, sound and language.

Cross-Modal Retrieval Representation Learning +1

Network Dissection: Quantifying Interpretability of Deep Visual Representations

1 code implementation CVPR 2017 David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, Antonio Torralba

Given any CNN model, the proposed method draws on a broad data set of visual concepts to score the semantics of hidden units at each intermediate convolutional layer.

Open Vocabulary Scene Parsing

no code implementations ICCV 2017 Hang Zhao, Xavier Puig, Bolei Zhou, Sanja Fidler, Antonio Torralba

Recognizing arbitrary objects in the wild has been a challenging problem due to the limitations of existing classification models and datasets.

General Classification Scene Parsing

Face-to-BMI: Using Computer Vision to Infer Body Mass Index on Social Media

no code implementations9 Mar 2017 Enes Kocabey, Mustafa Camurcu, Ferda Ofli, Yusuf Aytar, Javier Marin, Antonio Torralba, Ingmar Weber

A person's weight status can have profound implications on their life, ranging from mental health, to longevity, to financial income.

Body Mass Index (BMI) Prediction

SegICP: Integrated Deep Semantic Segmentation and Pose Estimation

2 code implementations5 Mar 2017 Jay M. Wong, Vincent Kee, Tiffany Le, Syler Wagner, Gian-Luca Mariottini, Abraham Schneider, Lei Hamilton, Rahul Chipalkatty, Mitchell Hebert, David M. S. Johnson, Jimmy Wu, Bolei Zhou, Antonio Torralba

Recent robotic manipulation competitions have highlighted that sophisticated robots still struggle to achieve fast and reliable perception of task-relevant objects in complex, realistic scenarios.

Object Recognition Point Cloud Registration +3

Is Saki #delicious? The Food Perception Gap on Instagram and Its Relation to Health

no code implementations21 Feb 2017 Ferda Ofli, Yusuf Aytar, Ingmar Weber, Raggi al Hammouri, Antonio Torralba

Studying how food is perceived in relation to what it actually is typically involves a laboratory setup.


Following Gaze Across Views

no code implementations9 Dec 2016 Adrià Recasens, Carl Vondrick, Aditya Khosla, Antonio Torralba

In this paper, we present an approach for following gaze across views by predicting where a particular person is looking throughout a scene.

Who is Mistaken?

no code implementations4 Dec 2016 Benjamin Eysenbach, Carl Vondrick, Antonio Torralba

We then create a representation of characters' beliefs for two tasks in human action understanding: predicting who is mistaken, and when they are mistaken.

Action Understanding

A Compositional Object-Based Approach to Learning Physical Dynamics

1 code implementation1 Dec 2016 Michael B. Chang, Tomer Ullman, Antonio Torralba, Joshua B. Tenenbaum

By comparing to less structured architectures, we show that the NPE's compositional representation of the structure in physical interactions improves its ability to predict movement, generalize across variable object count and different scene configurations, and infer latent properties of objects such as mass.


SoundNet: Learning Sound Representations from Unlabeled Video

6 code implementations NeurIPS 2016 Yusuf Aytar, Carl Vondrick, Antonio Torralba

We learn rich natural sound representations by capitalizing on large amounts of unlabeled sound data collected in the wild.

General Classification

Cross-Modal Scene Networks

no code implementations27 Oct 2016 Yusuf Aytar, Lluis Castrejon, Carl Vondrick, Hamed Pirsiavash, Antonio Torralba

Our experiments suggest that our scene representation can help transfer representations across modalities for retrieval.


Places: An Image Database for Deep Scene Understanding

no code implementations6 Oct 2016 Bolei Zhou, Aditya Khosla, Agata Lapedriza, Antonio Torralba, Aude Oliva

The rise of multi-million-item dataset initiatives has enabled data-hungry machine learning algorithms to reach near-human semantic classification at tasks such as object and scene recognition.

BIG-bench Machine Learning Classification +4

Generating Videos with Scene Dynamics

no code implementations NeurIPS 2016 Carl Vondrick, Hamed Pirsiavash, Antonio Torralba

We capitalize on large amounts of unlabeled video in order to learn a model of scene dynamics for both video recognition tasks (e. g. action classification) and video generation tasks (e. g. future prediction).