Search Results for author: Antonio Torralba

Found 135 papers, 53 papers with code

Understanding and Estimating the Adaptability of Domain-Invariant Representations

no code implementations ICML 2020 Ching-Yao Chuang, Antonio Torralba, Stefanie Jegelka

We also propose a method for estimating how well a model based on domain-invariant representations will perform on the target domain, without having seen any target labels.

Model Selection Unsupervised Domain Adaptation

Editing a classifier by rewriting its prediction rules

1 code implementation NeurIPS 2021 Shibani Santurkar, Dimitris Tsipras, Mahalaxmi Elango, David Bau, Antonio Torralba, Aleksander Madry

We present a methodology for modifying the behavior of a classifier by directly rewriting its prediction rules.

PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning

no code implementations NeurIPS 2021 Yining Hong, Li Yi, Josh Tenenbaum, Antonio Torralba, Chuang Gan

A critical aspect of human visual perception is the ability to parse visual scenes into individual objects and further into object parts, forming part-whole hierarchies.

Instance Segmentation Semantic Segmentation +1

Learning to Compose Visual Relations

no code implementations NeurIPS 2021 Nan Liu, Shuang Li, Yilun Du, Joshua B. Tenenbaum, Antonio Torralba

The visual world around us can be described as a structured set of objects and their associated relations.

EditGAN: High-Precision Semantic Image Editing

no code implementations NeurIPS 2021 Huan Ling, Karsten Kreis, Daiqing Li, Seung Wook Kim, Antonio Torralba, Sanja Fidler

EditGAN builds on a GAN framework that jointly models images and their semantic segmentations, requiring only a handful of labeled examples, making it a scalable tool for editing.

Semantic Segmentation

OPEn: An Open-ended Physics Environment for Learning Without a Task

no code implementations13 Oct 2021 Chuang Gan, Abhishek Bhandwaldar, Antonio Torralba, Joshua B. Tenenbaum, Phillip Isola

We test several existing RL-based exploration methods on this benchmark and find that an agent using unsupervised contrastive learning for representation learning, and impact-driven learning for exploration, achieved the best results.

Contrastive Learning Representation Learning

Toward a Visual Concept Vocabulary for GAN Latent Space

1 code implementation ICCV 2021 Sarah Schwettmann, Evan Hernandez, David Bau, Samuel Klein, Jacob Andreas, Antonio Torralba

A large body of recent work has identified transformations in the latent spaces of generative adversarial networks (GANs) that consistently and interpretably transform generated images.

Weakly Supervised Human-Object Interaction Detection in Video via Contrastive Spatiotemporal Regions

1 code implementation ICCV 2021 Shuang Li, Yilun Du, Antonio Torralba, Josef Sivic, Bryan Russell

Our task poses unique challenges as a system does not know what types of human-object interactions are present in a video or the actual spatiotemporal location of the human and the object.

Human-Object Interaction Detection

Scaling up instance annotation via label propagation

no code implementations ICCV 2021 Dim P. Papadopoulos, Ethan Weber, Antonio Torralba

Through a large-scale experiment to populate 1M unlabeled images with object segmentation masks for 80 object classes, we show that (1) we obtain 1M object segmentation masks with an total annotation time of only 290 hours; (2) we reduce annotation time by 76x compared to manual annotation; (3) the segmentation quality of our masks is on par with those from manually annotated datasets.

Interactive Segmentation Semantic Segmentation

Skill Induction and Planning with Latent Language

no code implementations4 Oct 2021 Pratyusha Sharma, Antonio Torralba, Jacob Andreas

We present a framework for learning hierarchical policies from demonstrations, using sparse natural language annotations to guide the discovery of reusable skills for autonomous decision-making.

Decision Making

Dynamic Modeling of Hand-Object Interactions via Tactile Sensing

no code implementations9 Sep 2021 Qiang Zhang, Yunzhu Li, Yiyue Luo, Wan Shou, Michael Foshey, Junchi Yan, Joshua B. Tenenbaum, Wojciech Matusik, Antonio Torralba

This work takes a step on dynamics modeling in hand-object interactions from dense tactile sensing, which opens the door for future applications in activity learning, human-computer interactions, and imitation learning for robotics.

Contrastive Learning Imitation Learning

What You Can Learn by Staring at a Blank Wall

no code implementations ICCV 2021 Prafull Sharma, Miika Aittala, Yoav Y. Schechner, Antonio Torralba, Gregory W. Wornell, William T. Freeman, Fredo Durand

We present a passive non-line-of-sight method that infers the number of people or activity of a person from the observation of a blank wall in an unknown room.

Intelligent Carpet: Inferring 3D Human Pose From Tactile Signals

no code implementations CVPR 2021 Yiyue Luo, Yunzhu Li, Michael Foshey, Wan Shou, Pratyusha Sharma, Tomas Palacios, Antonio Torralba, Wojciech Matusik

In this work, leveraging such tactile interactions, we propose a 3D human pose estimation approach using the pressure maps recorded by a tactile carpet as input.

3D Human Pose Estimation Multi-Person Pose Estimation

Learning to See by Looking at Noise

1 code implementation NeurIPS 2021 Manel Baradad, Jonas Wulff, Tongzhou Wang, Phillip Isola, Antonio Torralba

We investigate a suite of image generation models that produce images from simple random processes.

Image Generation

Cetacean Translation Initiative: a roadmap to deciphering the communication of sperm whales

no code implementations17 Apr 2021 Jacob Andreas, Gašper Beguš, Michael M. Bronstein, Roee Diamant, Denley Delaney, Shane Gero, Shafi Goldwasser, David F. Gruber, Sarah de Haas, Peter Malkin, Roger Payne, Giovanni Petri, Daniela Rus, Pratyusha Sharma, Dan Tchernov, Pernille Tønnesen, Antonio Torralba, Daniel Vogt, Robert J. Wood

We posit that machine learning will be the cornerstone of future collection, processing, and analysis of multimodal streams of data in animal communication studies, including bioacoustic, behavioral, biological, and environmental data.


DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort

1 code implementation CVPR 2021 Yuxuan Zhang, Huan Ling, Jun Gao, Kangxue Yin, Jean-Francois Lafleche, Adela Barriuso, Antonio Torralba, Sanja Fidler

To showcase the power of our approach, we generated datasets for 7 image segmentation tasks which include pixel-level labels for 34 human face parts, and 32 car parts.

Semantic Segmentation

BARF: Bundle-Adjusting Neural Radiance Fields

3 code implementations ICCV 2021 Chen-Hsuan Lin, Wei-Chiu Ma, Antonio Torralba, Simon Lucey

In this paper, we propose Bundle-Adjusting Neural Radiance Fields (BARF) for training NeRF from imperfect (or even unknown) camera poses -- the joint problem of learning neural 3D representations and registering camera frames.

Visual Localization

Deep Feedback Inverse Problem Solver

no code implementations ECCV 2020 Wei-Chiu Ma, Shenlong Wang, Jiayuan Gu, Sivabalan Manivasagam, Antonio Torralba, Raquel Urtasun

Specifically, at each iteration, the neural network takes the feedback as input and outputs an update on the current estimation.

Pose Estimation

Image GANs meet Differentiable Rendering for Inverse Graphics and Interpretable 3D Neural Rendering

no code implementations ICLR 2021 Yuxuan Zhang, Wenzheng Chen, Huan Ling, Jun Gao, Yinan Zhang, Antonio Torralba, Sanja Fidler

Key to our approach is to exploit GANs as a multi-view data generator to train an inverse graphics network using an off-the-shelf differentiable renderer, and the trained inverse graphics network as a teacher to disentangle the GAN's latent code into interpretable 3D properties.

Neural Rendering

Improving Inversion and Generation Diversity in StyleGAN using a Gaussianized Latent Space

no code implementations14 Sep 2020 Jonas Wulff, Antonio Torralba

We show that, under a simple nonlinear operation, the data distribution can be modeled as Gaussian and therefore expressed using sufficient statistics.

Understanding the Role of Individual Units in a Deep Neural Network

2 code implementations10 Sep 2020 David Bau, Jun-Yan Zhu, Hendrik Strobelt, Agata Lapedriza, Bolei Zhou, Antonio Torralba

Second, we use a similar analytic method to analyze a generative adversarial network (GAN) model trained to generate scenes.

Image Classification Image Generation +1

The Hessian Penalty: A Weak Prior for Unsupervised Disentanglement

1 code implementation ECCV 2020 William Peebles, John Peebles, Jun-Yan Zhu, Alexei Efros, Antonio Torralba

In this paper, we propose the Hessian Penalty, a simple regularization term that encourages the Hessian of a generative model with respect to its input to be diagonal.

Detecting natural disasters, damage, and incidents in the wild

1 code implementation ECCV 2020 Ethan Weber, Nuria Marzo, Dim P. Papadopoulos, Aritro Biswas, Agata Lapedriza, Ferda Ofli, Muhammad Imran, Antonio Torralba

While most studies on social media are limited to text, images offer more information for understanding disaster and incident scenes.

Rewriting a Deep Generative Model

4 code implementations ECCV 2020 David Bau, Steven Liu, Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba

To address the problem, we propose a formulation in which the desired rule is changed by manipulating a layer of a deep network as a linear associative memory.


Noisy Agents: Self-supervised Exploration by Predicting Auditory Events

no code implementations27 Jul 2020 Chuang Gan, Xiaoyu Chen, Phillip Isola, Antonio Torralba, Joshua B. Tenenbaum

Humans integrate multiple sensory modalities (e. g. visual and audio) to build a causal understanding of the physical world.

Atari Games

Foley Music: Learning to Generate Music from Videos

no code implementations ECCV 2020 Chuang Gan, Deng Huang, Peihao Chen, Joshua B. Tenenbaum, Antonio Torralba

In this paper, we introduce Foley Music, a system that can synthesize plausible music for a silent video clip about people playing musical instruments.

Music Generation Translation

Estimating Generalization under Distribution Shifts via Domain-Invariant Representations

1 code implementation6 Jul 2020 Ching-Yao Chuang, Antonio Torralba, Stefanie Jegelka

When machine learning models are deployed on a test distribution different from the training distribution, they can perform poorly, but overestimate their performance.

Domain Adaptation Model Selection

Debiased Contrastive Learning

1 code implementation NeurIPS 2020 Ching-Yao Chuang, Joshua Robinson, Lin Yen-Chen, Antonio Torralba, Stefanie Jegelka

A prominent technique for self-supervised representation learning has been to contrast semantically similar and dissimilar pairs of samples.

Contrastive Learning Generalization Bounds +1

Causal Discovery in Physical Systems from Videos

1 code implementation NeurIPS 2020 Yunzhu Li, Antonio Torralba, Animashree Anandkumar, Dieter Fox, Animesh Garg

We assume access to different configurations and environmental conditions, i. e., data from unknown interventions on the underlying system; thus, we can hope to discover the correct underlying causal graph without explicit interventions.

Causal Discovery

Diverse Image Generation via Self-Conditioned GANs

2 code implementations CVPR 2020 Steven Liu, Tongzhou Wang, David Bau, Jun-Yan Zhu, Antonio Torralba

We introduce a simple but effective unsupervised method for generating realistic and diverse images.

Image Generation

Deep Audio Priors Emerge From Harmonic Convolutional Networks

no code implementations ICLR 2020 Zhoutong Zhang, Yunyun Wang, Chuang Gan, Jiajun Wu, Joshua B. Tenenbaum, Antonio Torralba, William T. Freeman

We show that networks using Harmonic Convolution can reliably model audio priors and achieve high performance in unsupervised audio restoration tasks.

Visual Grounding of Learned Physical Models

no code implementations ICML 2020 Yunzhu Li, Toru Lin, Kexin Yi, Daniel M. Bear, Daniel L. K. Yamins, Jiajun Wu, Joshua B. Tenenbaum, Antonio Torralba

The abilities to perform physical reasoning and to adapt to new environments, while intrinsic to humans, remain challenging to state-of-the-art computational models.

Visual Grounding

Self-supervised Moving Vehicle Tracking with Stereo Sound

no code implementations ICCV 2019 Chuang Gan, Hang Zhao, Peihao Chen, David Cox, Antonio Torralba

At test time, the stereo-sound student network can work independently to perform object localization us-ing just stereo audio and camera meta-data, without any visual input.

Object Localization Visual Localization

Learning Compositional Koopman Operators for Model-Based Control

no code implementations ICLR 2020 Yunzhu Li, Hao He, Jiajun Wu, Dina Katabi, Antonio Torralba

Finding an embedding space for a linear approximation of a nonlinear dynamical system enables efficient system identification and control synthesis.

The Role of Embedding Complexity in Domain-invariant Representations

1 code implementation13 Oct 2019 Ching-Yao Chuang, Antonio Torralba, Stefanie Jegelka

In this work, we study, theoretically and empirically, the effect of the embedding complexity on generalization to the target domain.

Unsupervised Domain Adaptation

Neural Turtle Graphics for Modeling City Road Layouts

no code implementations ICCV 2019 Hang Chu, Daiqing Li, David Acuna, Amlan Kar, Maria Shugrina, Xinkai Wei, Ming-Yu Liu, Antonio Torralba, Sanja Fidler

We propose Neural Turtle Graphics (NTG), a novel generative model for spatial graphs, and demonstrate its applications in modeling city road layouts.

CLEVRER: CoLlision Events for Video REpresentation and Reasoning

3 code implementations ICLR 2020 Kexin Yi, Chuang Gan, Yunzhu Li, Pushmeet Kohli, Jiajun Wu, Antonio Torralba, Joshua B. Tenenbaum

While these models thrive on the perception-based task (descriptive), they perform poorly on the causal tasks (explanatory, predictive and counterfactual), suggesting that a principled approach for causal reasoning should incorporate the capability of both perceiving complex visual and language inputs, and understanding the underlying dynamics and causal relations.

Visual Reasoning

Connecting Touch and Vision via Cross-Modal Prediction

1 code implementation CVPR 2019 Yunzhu Li, Jun-Yan Zhu, Russ Tedrake, Antonio Torralba

To connect vision and touch, we introduce new tasks of synthesizing plausible tactile signals from visual inputs as well as imagining how we interact with objects given tactile data as input.

How to make a pizza: Learning a compositional layer-based GAN model

no code implementations CVPR 2019 Dim P. Papadopoulos, Youssef Tamaazousti, Ferda Ofli, Ingmar Weber, Antonio Torralba

From a visual perspective, every instruction step can be seen as a way to change the visual appearance of the dish by adding extra objects (e. g., adding an ingredient) or changing the appearance of the existing ones (e. g., cooking the dish).

Meta-Sim: Learning to Generate Synthetic Datasets

no code implementations ICCV 2019 Amlan Kar, Aayush Prakash, Ming-Yu Liu, Eric Cameracci, Justin Yuan, Matt Rusiniak, David Acuna, Antonio Torralba, Sanja Fidler

Training models to high-end performance requires availability of large labeled datasets, which are expensive to get.

Self-Supervised Audio-Visual Co-Segmentation

no code implementations18 Apr 2019 Andrew Rouditchenko, Hang Zhao, Chuang Gan, Josh Mcdermott, Antonio Torralba

Segmenting objects in images and separating sound sources in audio are challenging tasks, in part because traditional approaches require large amounts of labeled data.

Semantic Segmentation

On the Units of GANs (Extended Abstract)

no code implementations29 Jan 2019 David Bau, Jun-Yan Zhu, Hendrik Strobelt, Bolei Zhou, Joshua B. Tenenbaum, William T. Freeman, Antonio Torralba

We quantify the causal effect of interpretable units by measuring the ability of interventions to control objects in the output.

Visual Object Networks: Image Generation with Disentangled 3D Representations

1 code implementation NeurIPS 2018 Jun-Yan Zhu, Zhoutong Zhang, Chengkai Zhang, Jiajun Wu, Antonio Torralba, Josh Tenenbaum, Bill Freeman

The VON not only generates images that are more realistic than the state-of-the-art 2D image synthesis methods but also enables many 3D operations such as changing the viewpoint of a generated image, shape and texture editing, linear interpolation in texture and shape space, and transferring appearance across different objects and viewpoints.

Image Generation

Dataset Distillation

1 code implementation27 Nov 2018 Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba, Alexei A. Efros

Model distillation aims to distill the knowledge of a complex model into a simpler one.

Model distillation

Recipe1M+: A Dataset for Learning Cross-Modal Embeddings for Cooking Recipes and Food Images

no code implementations14 Oct 2018 Javier Marin, Aritro Biswas, Ferda Ofli, Nicholas Hynes, Amaia Salvador, Yusuf Aytar, Ingmar Weber, Antonio Torralba

In this paper, we introduce Recipe1M+, a new large-scale, structured corpus of over one million cooking recipes and 13 million food images.

General Classification

Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding

1 code implementation NeurIPS 2018 Kexin Yi, Jiajun Wu, Chuang Gan, Antonio Torralba, Pushmeet Kohli, Joshua B. Tenenbaum

Second, the model is more data- and memory-efficient: it performs well after learning on a small number of training data; it can also encode an image into a compact representation, requiring less storage than existing methods for offline question answering.

Language understanding Question Answering +2

Propagation Networks for Model-Based Control Under Partial Observation

1 code implementation28 Sep 2018 Yunzhu Li, Jiajun Wu, Jun-Yan Zhu, Joshua B. Tenenbaum, Antonio Torralba, Russ Tedrake

There has been an increasing interest in learning dynamics simulators for model-based control.

Learning to Zoom: a Saliency-Based Sampling Layer for Neural Networks

1 code implementation ECCV 2018 Adrià Recasens, Petr Kellnhofer, Simon Stent, Wojciech Matusik, Antonio Torralba

We introduce a saliency-based distortion layer for convolutional neural networks that helps to improve the spatial sampling of input data for a given task.

Caricature Gaze Estimation +3

Single Image Intrinsic Decomposition without a Single Intrinsic Image

no code implementations ECCV 2018 Wei-Chiu Ma, Hang Chu, Bolei Zhou, Raquel Urtasun, Antonio Torralba

At inference time, our model can be easily reduced to a single stream module that performs intrinsic decomposition on a single input image.

Intrinsic Image Decomposition

Interpretable Basis Decomposition for Visual Explanation

1 code implementation ECCV 2018 Bolei Zhou, Yiyou Sun, David Bau, Antonio Torralba

Explanations of the decisions made by a deep neural network are important for human end-users to be able to understand and diagnose the trustworthiness of the system.

3D-Aware Scene Manipulation via Inverse Graphics

1 code implementation NeurIPS 2018 Shunyu Yao, Tzu Ming Harry Hsu, Jun-Yan Zhu, Jiajun Wu, Antonio Torralba, William T. Freeman, Joshua B. Tenenbaum

In this work, we propose 3D scene de-rendering networks (3D-SDN) to address the above issues by integrating disentangled representations for semantics, geometry, and appearance into a deep generative model.

VirtualHome: Simulating Household Activities via Programs

2 code implementations CVPR 2018 Xavier Puig, Kevin Ra, Marko Boben, Jiaman Li, Tingwu Wang, Sanja Fidler, Antonio Torralba

We then implement the most common atomic (inter)actions in the Unity3D game engine, and use our programs to "drive" an artificial agent to execute tasks in a simulated household environment.

Video Understanding

Revisiting the Importance of Individual Units in CNNs via Ablation

no code implementations7 Jun 2018 Bolei Zhou, Yiyou Sun, David Bau, Antonio Torralba

We confirm that unit attributes such as class selectivity are a poor predictor for impact on overall accuracy as found previously in recent work \cite{morcos2018importance}.

General Classification

Through-Wall Human Pose Estimation Using Radio Signals

no code implementations CVPR 2018 Ming-Min Zhao, Tianhong Li, Mohammad Abu Alsheikh, Yonglong Tian, Hang Zhao, Antonio Torralba, Dina Katabi

Yet, unlike vision-based pose estimation, the radio-based system can estimate 2D poses through walls despite never trained on such scenarios.

RF-based Pose Estimation

Inferring Light Fields From Shadows

1 code implementation CVPR 2018 Manel Baradad, Vickie Ye, Adam B. Yedidia, Frédo Durand, William T. Freeman, Gregory W. Wornell, Antonio Torralba

We present a method for inferring a 4D light field of a hidden scene from 2D shadows cast by a known occluder on a diffuse wall.

The Sound of Pixels

2 code implementations ECCV 2018 Hang Zhao, Chuang Gan, Andrew Rouditchenko, Carl Vondrick, Josh Mcdermott, Antonio Torralba

We introduce PixelPlayer, a system that, by leveraging large amounts of unlabeled videos, learns to locate image regions which produce sounds and separate the input sounds into a set of components that represents the sound from each pixel.

Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input

no code implementations ECCV 2018 David Harwath, Adrià Recasens, Dídac Surís, Galen Chuang, Antonio Torralba, James Glass

In this paper, we explore neural network models that learn to associate segments of spoken audio captions with the semantically relevant portions of natural images that they refer to.

3D Interpreter Networks for Viewer-Centered Wireframe Modeling

no code implementations3 Apr 2018 Jiajun Wu, Tianfan Xue, Joseph J. Lim, Yuandong Tian, Joshua B. Tenenbaum, Antonio Torralba, William T. Freeman

3D-INN is trained on real images to estimate 2D keypoint heatmaps from an input image; it then predicts 3D object structure from heatmaps using knowledge learned from synthetic 3D shapes.

Image Retrieval

Counterfactual Image Networks

no code implementations ICLR 2018 Deniz Oktay, Carl Vondrick, Antonio Torralba

However, when a layer is removed, the model learns to produce a different image that still looks natural to an adversary, which is possible by removing objects.

Semantic Segmentation

Learning Sight from Sound: Ambient Sound Provides Supervision for Visual Learning

no code implementations20 Dec 2017 Andrew Owens, Jiajun Wu, Josh H. McDermott, William T. Freeman, Antonio Torralba

The sound of crashing waves, the roar of fast-moving cars -- sound conveys important information about the objects in our surroundings.

Temporal Relational Reasoning in Videos

3 code implementations ECCV 2018 Bolei Zhou, Alex Andonian, Aude Oliva, Antonio Torralba

Temporal relational reasoning, the ability to link meaningful transformations of objects or entities over time, is a fundamental property of intelligent species.

Action Classification Action Recognition +3

Interpreting Deep Visual Representations via Network Dissection

1 code implementation15 Nov 2017 Bolei Zhou, David Bau, Aude Oliva, Antonio Torralba

In this work, we describe Network Dissection, a method that interprets networks by providing labels for the units of their deep visual representations.

Following Gaze in Video

no code implementations ICCV 2017 Adria Recasens, Carl Vondrick, Aditya Khosla, Antonio Torralba

In this paper, we present an approach for following gaze in video by predicting where a person (in the video) is looking even when the object is in a different frame.

Turning Corners Into Cameras: Principles and Methods

no code implementations ICCV 2017 Katherine L. Bouman, Vickie Ye, Adam B. Yedidia, Fredo Durand, Gregory W. Wornell, Antonio Torralba, William T. Freeman

We show that walls and other obstructions with edges can be exploited as naturally-occurring "cameras" that reveal the hidden scenes beyond them.

Scene Parsing Through ADE20K Dataset

no code implementations CVPR 2017 Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, Antonio Torralba

A novel network design called Cascade Segmentation Module is proposed to parse a scene into stuff, objects, and object parts in a cascade and improve over the baselines.

Scene Parsing

See, Hear, and Read: Deep Aligned Representations

1 code implementation3 Jun 2017 Yusuf Aytar, Carl Vondrick, Antonio Torralba

We capitalize on large amounts of readily-available, synchronous data to learn a deep discriminative representations shared across three major natural modalities: vision, sound and language.

Cross-Modal Retrieval Representation Learning

Network Dissection: Quantifying Interpretability of Deep Visual Representations

no code implementations CVPR 2017 David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, Antonio Torralba

Given any CNN model, the proposed method draws on a broad data set of visual concepts to score the semantics of hidden units at each intermediate convolutional layer.

Open Vocabulary Scene Parsing

no code implementations ICCV 2017 Hang Zhao, Xavier Puig, Bolei Zhou, Sanja Fidler, Antonio Torralba

Recognizing arbitrary objects in the wild has been a challenging problem due to the limitations of existing classification models and datasets.

General Classification Scene Parsing

Face-to-BMI: Using Computer Vision to Infer Body Mass Index on Social Media

no code implementations9 Mar 2017 Enes Kocabey, Mustafa Camurcu, Ferda Ofli, Yusuf Aytar, Javier Marin, Antonio Torralba, Ingmar Weber

A person's weight status can have profound implications on their life, ranging from mental health, to longevity, to financial income.

SegICP: Integrated Deep Semantic Segmentation and Pose Estimation

2 code implementations5 Mar 2017 Jay M. Wong, Vincent Kee, Tiffany Le, Syler Wagner, Gian-Luca Mariottini, Abraham Schneider, Lei Hamilton, Rahul Chipalkatty, Mitchell Hebert, David M. S. Johnson, Jimmy Wu, Bolei Zhou, Antonio Torralba

Recent robotic manipulation competitions have highlighted that sophisticated robots still struggle to achieve fast and reliable perception of task-relevant objects in complex, realistic scenarios.

Motion Capture Object Recognition +3

Is Saki #delicious? The Food Perception Gap on Instagram and Its Relation to Health

no code implementations21 Feb 2017 Ferda Ofli, Yusuf Aytar, Ingmar Weber, Raggi al Hammouri, Antonio Torralba

Studying how food is perceived in relation to what it actually is typically involves a laboratory setup.

Following Gaze Across Views

no code implementations9 Dec 2016 Adrià Recasens, Carl Vondrick, Aditya Khosla, Antonio Torralba

In this paper, we present an approach for following gaze across views by predicting where a particular person is looking throughout a scene.

Who is Mistaken?

no code implementations4 Dec 2016 Benjamin Eysenbach, Carl Vondrick, Antonio Torralba

We then create a representation of characters' beliefs for two tasks in human action understanding: predicting who is mistaken, and when they are mistaken.

Action Understanding

A Compositional Object-Based Approach to Learning Physical Dynamics

1 code implementation1 Dec 2016 Michael B. Chang, Tomer Ullman, Antonio Torralba, Joshua B. Tenenbaum

By comparing to less structured architectures, we show that the NPE's compositional representation of the structure in physical interactions improves its ability to predict movement, generalize across variable object count and different scene configurations, and infer latent properties of objects such as mass.

SoundNet: Learning Sound Representations from Unlabeled Video

6 code implementations NeurIPS 2016 Yusuf Aytar, Carl Vondrick, Antonio Torralba

We learn rich natural sound representations by capitalizing on large amounts of unlabeled sound data collected in the wild.

General Classification Object Classification

Cross-Modal Scene Networks

no code implementations27 Oct 2016 Yusuf Aytar, Lluis Castrejon, Carl Vondrick, Hamed Pirsiavash, Antonio Torralba

Our experiments suggest that our scene representation can help transfer representations across modalities for retrieval.

Places: An Image Database for Deep Scene Understanding

no code implementations6 Oct 2016 Bolei Zhou, Aditya Khosla, Agata Lapedriza, Antonio Torralba, Aude Oliva

The rise of multi-million-item dataset initiatives has enabled data-hungry machine learning algorithms to reach near-human semantic classification at tasks such as object and scene recognition.

Classification General Classification +3

Generating Videos with Scene Dynamics

no code implementations NeurIPS 2016 Carl Vondrick, Hamed Pirsiavash, Antonio Torralba

We capitalize on large amounts of unlabeled video in order to learn a model of scene dynamics for both video recognition tasks (e. g. action classification) and video generation tasks (e. g. future prediction).

Action Classification Future prediction +6

Ambient Sound Provides Supervision for Visual Learning

1 code implementation25 Aug 2016 Andrew Owens, Jiajun Wu, Josh H. McDermott, William T. Freeman, Antonio Torralba

We show that, through this process, the network learns a representation that conveys information about objects and scenes.

Object Recognition

Semantic Understanding of Scenes through the ADE20K Dataset

20 code implementations18 Aug 2016 Bolei Zhou, Hang Zhao, Xavier Puig, Tete Xiao, Sanja Fidler, Adela Barriuso, Antonio Torralba

Scene parsing, or recognizing and segmenting objects and stuff in an image, is one of the key problems in computer vision.

Scene Parsing Semantic Segmentation

Learning Aligned Cross-Modal Representations from Weakly Aligned Data

no code implementations CVPR 2016 Lluis Castrejon, Yusuf Aytar, Carl Vondrick, Hamed Pirsiavash, Antonio Torralba

Our experiments suggest that our scene representation can help transfer representations across modalities for retrieval.

Eye Tracking for Everyone

2 code implementations CVPR 2016 Kyle Krafka, Aditya Khosla, Petr Kellnhofer, Harini Kannan, Suchendra Bhandarkar, Wojciech Matusik, Antonio Torralba

We believe that we can put the power of eye tracking in everyone's palm by building eye tracking software that works on commodity hardware such as mobile phones and tablets, without the need for additional sensors or devices.

Eye Tracking Gaze Estimation

Single Image 3D Interpreter Network

no code implementations29 Apr 2016 Jiajun Wu, Tianfan Xue, Joseph J. Lim, Yuandong Tian, Joshua B. Tenenbaum, Antonio Torralba, William T. Freeman

In this work, we propose 3D INterpreter Network (3D-INN), an end-to-end framework which sequentially estimates 2D keypoint heatmaps and 3D object structure, trained on both real 2D-annotated images and synthetic 3D data.

Image Retrieval

What do different evaluation metrics tell us about saliency models?

1 code implementation12 Apr 2016 Zoya Bylinskii, Tilke Judd, Aude Oliva, Antonio Torralba, Frédo Durand

How best to evaluate a saliency model's ability to predict where humans look in images is an open research question.

Deep Neural Networks predict Hierarchical Spatio-temporal Cortical Dynamics of Human Visual Object Recognition

no code implementations12 Jan 2016 Radoslaw M. Cichy, Aditya Khosla, Dimitrios Pantazis, Antonio Torralba, Aude Oliva

The complex multi-stage architecture of cortical visual pathways provides the neural basis for efficient visual object recognition in humans.

Object Recognition

Learning Deep Features for Discriminative Localization

33 code implementations CVPR 2016 Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, Antonio Torralba

In this work, we revisit the global average pooling layer proposed in [13], and shed light on how it explicitly enables the convolutional neural network to have remarkable localization ability despite being trained on image-level labels.

Weakly-Supervised Object Localization

Where are they looking?

no code implementations NeurIPS 2015 Adria Recasens, Aditya Khosla, Carl Vondrick, Antonio Torralba

Humans have the remarkable ability to follow the gaze of other people to identify what they are looking at.

Understanding and Predicting Image Memorability at a Large Scale

no code implementations ICCV 2015 Aditya Khosla, Akhil S. Raju, Antonio Torralba, Aude Oliva

Progress in estimating visual memorability has been limited by the small scale and lack of variety of benchmark data.

Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books

3 code implementations ICCV 2015 Yukun Zhu, Ryan Kiros, Richard Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, Sanja Fidler

Books are a rich source of both fine-grained information, how a character, an object or a scene looks like, as well as high-level semantics, what someone is thinking, feeling and how these states evolve through a story.

Sentence Embedding

Skip-Thought Vectors

16 code implementations NeurIPS 2015 Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S. Zemel, Antonio Torralba, Raquel Urtasun, Sanja Fidler

The end result is an off-the-shelf encoder that can produce highly generic sentence representations that are robust and perform well in practice.

Anticipating Visual Representations from Unlabeled Video

no code implementations CVPR 2016 Carl Vondrick, Hamed Pirsiavash, Antonio Torralba

The key idea behind our approach is that we can train deep networks to predict the visual representation of images in the future.

Object Detectors Emerge in Deep Scene CNNs

1 code implementation22 Dec 2014 Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, Antonio Torralba

With the success of new computational architectures for visual processing, such as convolutional neural networks (CNN) and access to image databases with millions of labeled examples (e. g., ImageNet, Places), the state of the art in computer vision is advancing rapidly.

Classification General Classification +3

Learning Deep Features for Scene Recognition using Places Database

no code implementations NeurIPS 2014 Bolei Zhou, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, Aude Oliva

Whereas the tremendous recent progress in object recognition tasks is due to the availability of large datasets like ImageNet and the rise of Convolutional Neural Networks (CNNs) for learning high-level features, performance at scene recognition has not attained the same level of success.

Object Recognition Scene Recognition

Learning visual biases from human imagination

no code implementations NeurIPS 2015 Carl Vondrick, Hamed Pirsiavash, Aude Oliva, Antonio Torralba

Although the human visual system can recognize many concepts under challenging conditions, it still has some biases.

Object Recognition

Predicting Motivations of Actions by Leveraging Text

no code implementations CVPR 2016 Carl Vondrick, Deniz Oktay, Hamed Pirsiavash, Antonio Torralba

In this paper, we introduce the problem of predicting why a person has performed an action in images.

Looking Beyond the Visible Scene

no code implementations CVPR 2014 Aditya Khosla, Byoungkwon An An, Joseph J. Lim, Antonio Torralba

In this work, we propose to look beyond the visible elements of a scene; we demonstrate that a scene is not just a collection of objects and their configuration or the labels assigned to its pixels - it is so much more.

Scene Understanding

Are all training examples equally valuable?

no code implementations25 Nov 2013 Agata Lapedriza, Hamed Pirsiavash, Zoya Bylinskii, Antonio Torralba

When learning a new concept, not all training examples may prove equally useful for training: some may have higher or lower training value than others.

Inverting and Visualizing Features for Object Detection

no code implementations11 Dec 2012 Carl Vondrick, Aditya Khosla, Tomasz Malisiewicz, Antonio Torralba

By visualizing feature spaces, we can gain a more intuitive understanding of our detection systems.

Object Detection

Localizing 3D cuboids in single-view images

no code implementations NeurIPS 2012 Jianxiong Xiao, Bryan Russell, Antonio Torralba

In this paper we seek to detect rectangular cuboids and localize their corners in uncalibrated single-view images depicting everyday scenes.

Learning to Learn with Compound HD Models

no code implementations NeurIPS 2011 Antonio Torralba, Joshua B. Tenenbaum, Ruslan R. Salakhutdinov

We introduce HD (or ``Hierarchical-Deep'') models, a new compositional learning architecture that integrates deep learning models with structured hierarchical Bayesian models.

Motion Capture Object Recognition

Understanding the Intrinsic Memorability of Images

no code implementations NeurIPS 2011 Phillip Isola, Devi Parikh, Antonio Torralba, Aude Oliva

Artists, advertisers, and photographers are routinely presented with the task of creating an image that a viewer will remember.

Feature Selection

Unsupervised Detection of Regions of Interest Using Iterative Link Analysis

no code implementations NeurIPS 2009 Gunhee Kim, Antonio Torralba

This paper proposes a fast and scalable alternating optimization technique to detect regions of interest (ROIs) in cluttered Web images without labels.

Nonparametric Bayesian Texture Learning and Synthesis

no code implementations NeurIPS 2009 Long Zhu, Yuanahao Chen, Bill Freeman, Antonio Torralba

2D-HMM is coupled with the Hierarchical Dirichlet process (HDP) which allows the number of textons and the complexity of transition matrix grow as the input texture becomes irregular.

Semantic Segmentation Texture Synthesis

Semi-Supervised Learning in Gigantic Image Collections

no code implementations NeurIPS 2009 Rob Fergus, Yair Weiss, Antonio Torralba

With the advent of the Internet it is now possible to collect hundreds of millions of images.

Spectral Hashing

no code implementations NeurIPS 2008 Yair Weiss, Antonio Torralba, Rob Fergus

Semantic hashing seeks compact binary codes of datapoints so that the Hamming distance between codewords correlates with semantic similarity.

graph partitioning Semantic Similarity +1

Cannot find the paper you are looking for? You can Submit a new open access paper.