no code implementations • ICML 2020 • Ching-Yao Chuang, Antonio Torralba, Stefanie Jegelka
We also propose a method for estimating how well a model based on domain-invariant representations will perform on the target domain, without having seen any target labels.
no code implementations • 10 Jan 2025 • Vighnesh Subramaniam, Yilun Du, Joshua B. Tenenbaum, Antonio Torralba, Shuang Li, Igor Mordatch
By training each model on independent sets of data, we illustrate how this approach enables specialization across models and diversification over the set of models.
no code implementations • 4 Dec 2024 • Hanxue Liang, Jiawei Ren, Ashkan Mirzaei, Antonio Torralba, Ziwei Liu, Igor Gilitschenski, Sanja Fidler, Cengiz Oztireli, Huan Ling, Zan Gojcic, Jiahui Huang
Recent advancements in static feed-forward scene reconstruction have demonstrated significant progress in high-quality novel view synthesis.
no code implementations • 29 Nov 2024 • Hui Ren, Joanna Materzynska, Rohit Gandikota, David Bau, Antonio Torralba
We explore the question: "How much prior art knowledge is needed to create art?"
no code implementations • 26 Nov 2024 • Yael Vinker, Tamar Rott Shaham, Kristine Zheng, Alex Zhao, Judith E Fan, Antonio Torralba
In this work, we introduce SketchAgent, a language-driven, sequential sketch generation method that enables users to create, modify, and refine sketches through dynamic, conversational interactions.
1 code implementation • 4 Nov 2024 • Shivam Duggal, Phillip Isola, Antonio Torralba, William T. Freeman
Our encoder-decoder architecture recursively processes 2D image tokens, distilling them into 1D latent tokens over multiple iterations of recurrent rollouts.
no code implementations • 28 Oct 2024 • Reece Shuttleworth, Jacob Andreas, Antonio Torralba, Pratyusha Sharma
Second, we show that LoRA models with intruder dimensions, despite achieving similar performance to full fine-tuning on the target task, become worse models of the pre-training distribution and adapt less robustly to multiple tasks sequentially.
1 code implementation • 30 Sep 2024 • Adrián Rodríguez-Muñoz, Tongzhou Wang, Antonio Torralba
Adversarially robust models are locally smooth around each data sample so that small perturbations cannot drastically change model outputs.
no code implementations • 2 Aug 2024 • Zhenfang Chen, Shilong Dong, Kexin Yi, Yunzhu Li, Mingyu Ding, Antonio Torralba, Joshua B. Tenenbaum, Chuang Gan
The model is evaluated based on its capability to unravel the compositional hidden properties, such as mass and charge, and use this knowledge to answer a set of questions.
no code implementations • 14 Jun 2024 • Jiawei Ren, Kevin Xie, Ashkan Mirzaei, Hanxue Liang, Xiaohui Zeng, Karsten Kreis, Ziwei Liu, Antonio Torralba, Sanja Fidler, Seung Wook Kim, Huan Ling
We present L4GM, the first 4D Large Reconstruction Model that produces animated objects from a single-view video input -- in a single feed-forward pass that takes only a second.
1 code implementation • 22 Apr 2024 • Achyuta Rajaram, Neil Chowdhury, Antonio Torralba, Jacob Andreas, Sarah Schwettmann
To date, most discoveries of network subcomponents that implement human-interpretable computations in deep vision models have involved close study of single units and large amounts of human labor.
no code implementations • 22 Apr 2024 • Tamar Rott Shaham, Sarah Schwettmann, Franklin Wang, Achyuta Rajaram, Evan Hernandez, Jacob Andreas, Antonio Torralba
Interpretability experiments proposed by MAIA compose these tools to describe and explain system behavior.
no code implementations • 28 Mar 2024 • George Tang, Krishna Murthy Jatavallabhula, Antonio Torralba
We tackle the problem of learning an implicit scene representation for 3D instance segmentation from a sequence of posed RGB images.
no code implementations • 22 Mar 2024 • Kevin Xie, Jonathan Lorraine, Tianshi Cao, Jun Gao, James Lucas, Antonio Torralba, Sanja Fidler, Xiaohui Zeng
Recent text-to-3D generation approaches produce impressive 3D results but require time-consuming optimization that can take up to an hour per prompt.
no code implementations • 17 Mar 2024 • Lance Ying, Kunal Jha, Shivam Aarya, Joshua B. Tenenbaum, Antonio Torralba, Tianmin Shu
GOMA formulates verbal communication as a planning problem that minimizes the misalignment between the parts of agents' mental states that are relevant to the goals.
1 code implementation • 16 Jan 2024 • Chuanyang Jin, Yutong Wu, Jing Cao, Jiannan Xiang, Yen-Ling Kuo, Zhiting Hu, Tomer Ullman, Antonio Torralba, Joshua B. Tenenbaum, Tianmin Shu
To engineer multimodal ToM capacity, we propose a novel method, BIP-ALM (Bayesian Inverse Planning Accelerated by Language Models).
1 code implementation • NeurIPS 2023 • Tianhang Cheng, Wei-Chiu Ma, Kaiyu Guan, Antonio Torralba, Shenlong Wang
Our world is full of identical objects (\emphe. g., cans of coke, cars of same model).
no code implementations • CVPR 2024 • Pratyusha Sharma, Tamar Rott Shaham, Manel Baradad, Stephanie Fu, Adrian Rodriguez-Munoz, Shivam Duggal, Phillip Isola, Antonio Torralba
Although LLM-generated images do not look like natural images, results on image generation and the ability of models to correct these generated images indicate that precise modeling of strings can teach language models about numerous aspects of the visual world.
no code implementations • CVPR 2024 • Huan Ling, Seung Wook Kim, Antonio Torralba, Sanja Fidler, Karsten Kreis
We also propose a motion amplification mechanism as well as a new autoregressive synthesis scheme to generate and combine multiple 4D sequences for longer generation.
no code implementations • 7 Dec 2023 • Joanna Materzynska, Josef Sivic, Eli Shechtman, Antonio Torralba, Richard Zhang, Bryan Russell
To avoid overfitting to the new custom motion, we introduce an approach for regularization over videos.
1 code implementation • 20 Nov 2023 • Rohit Gandikota, Joanna Materzynska, Tingrui Zhou, Antonio Torralba, David Bau
We present a method to create interpretable concept sliders that enable precise control over attributes in image generations from diffusion models.
no code implementations • 28 Sep 2023 • Qiao Gu, Alihusein Kuwajerwala, Sacha Morin, Krishna Murthy Jatavallabhula, Bipasha Sen, Aditya Agarwal, Corban Rivera, William Paul, Kirsty Ellis, Rama Chellappa, Chuang Gan, Celso Miguel de Melo, Joshua B. Tenenbaum, Antonio Torralba, Florian Shkurti, Liam Paull
We demonstrate the utility of this representation through a number of downstream planning tasks that are specified through abstract (language) prompts and require complex reasoning over spatial and semantic concepts.
1 code implementation • NeurIPS 2023 • Sarah Schwettmann, Tamar Rott Shaham, Joanna Materzynska, Neil Chowdhury, Shuang Li, Jacob Andreas, David Bau, Antonio Torralba
FIND contains functions that resemble components of trained neural networks, and accompanying descriptions of the kind we seek to generate.
1 code implementation • 10 Aug 2023 • Alaa Maalouf, Ninad Jadhav, Krishna Murthy Jatavallabhula, Makram Chahine, Daniel M. Vogt, Robert J. Wood, Antonio Torralba, Daniela Rus
We demonstrate FAn on a real-world robotic system (a micro aerial vehicle) and report its ability to seamlessly follow the objects of interest in a real-time control loop.
no code implementations • 3 Aug 2023 • Sarah Schwettmann, Neil Chowdhury, Samuel Klein, David Bau, Antonio Torralba
Language models demonstrate remarkable capacity to generalize representations learned in one modality to downstream tasks in other modalities.
no code implementations • ICCV 2023 • Daiqing Li, Huan Ling, Amlan Kar, David Acuna, Seung Wook Kim, Karsten Kreis, Antonio Torralba, Sanja Fidler
In this work, we introduce a self-supervised feature representation learning framework DreamTeacher that utilizes generative networks for pre-training downstream image backbones.
no code implementations • 8 Jun 2023 • Manel Baradad, Yuanzhen Li, Forrester Cole, Michael Rubinstein, Antonio Torralba, William T. Freeman, Varun Jampani
To infer object depth on a real image, we place the segmented object into the learned background prompt and run off-the-shelf depth networks.
no code implementations • ICCV 2023 • Nan Liu, Yilun Du, Shuang Li, Joshua B. Tenenbaum, Antonio Torralba
Text-to-image generative models have enabled high-resolution image synthesis across different domains, but require users to specify the content they wish to generate.
1 code implementation • 23 May 2023 • Yilun Du, Shuang Li, Antonio Torralba, Joshua B. Tenenbaum, Igor Mordatch
Our findings indicate that this approach significantly enhances mathematical and strategic reasoning across a number of tasks.
2 code implementations • CVPR 2023 • George Cazenavette, Tongzhou Wang, Antonio Torralba, Alexei A. Efros, Jun-Yan Zhu
Dataset Distillation aims to distill an entire dataset's knowledge into a few synthetic images.
no code implementations • CVPR 2023 • Seung Wook Kim, Bradley Brown, Kangxue Yin, Karsten Kreis, Katja Schwarz, Daiqing Li, Robin Rombach, Antonio Torralba, Sanja Fidler
We first train a scene auto-encoder to express a set of image and pose pairs as a neural field, represented as density and feature voxel grids that can be projected to produce novel views of the scene.
1 code implementation • 3 Apr 2023 • Tongzhou Wang, Antonio Torralba, Phillip Isola, Amy Zhang
In goal-reaching reinforcement learning (RL), the optimal value function has a particular geometry, called quasimetric structure.
no code implementations • CVPR 2023 • Kun Su, Kaizhi Qian, Eli Shlizerman, Antonio Torralba, Chuang Gan
Traditional methods of impact sound synthesis use physics simulation to obtain a set of physics parameters that could represent and synthesize the sound.
1 code implementation • CVPR 2023 • Zhenyu Wang, YaLi Li, Xi Chen, Ser-Nam Lim, Antonio Torralba, Hengshuang Zhao, Shengjin Wang
In this paper, we formally address universal object detection, which aims to detect every scene and predict every category.
no code implementations • ICCV 2023 • Xi Chen, Shuang Li, Ser-Nam Lim, Antonio Torralba, Hengshuang Zhao
Open-vocabulary image segmentation is attracting increasing attention due to its critical applications in the real world.
1 code implementation • 4 Mar 2023 • Zhou Xian, Bo Zhu, Zhenjia Xu, Hsiao-Yu Tung, Antonio Torralba, Katerina Fragkiadaki, Chuang Gan
We identify several challenges for fluid manipulation learning by evaluating a set of reinforcement learning and trajectory optimization methods on our platform.
1 code implementation • 14 Feb 2023 • Krishna Murthy Jatavallabhula, Alihusein Kuwajerwala, Qiao Gu, Mohd Omama, Tao Chen, Alaa Maalouf, Shuang Li, Ganesh Iyer, Soroush Saryazdi, Nikhil Keetha, Ayush Tewari, Joshua B. Tenenbaum, Celso Miguel de Melo, Madhava Krishna, Liam Paull, Florian Shkurti, Antonio Torralba
ConceptFusion leverages the open-set capabilities of today's foundation models pre-trained on internet-scale data to reason about concepts across modalities such as natural language, images, and audio.
1 code implementation • 31 Jan 2023 • Ching-Yao Chuang, Varun Jampani, Yuanzhen Li, Antonio Torralba, Stefanie Jegelka
Machine learning models have been shown to inherit biases from their training datasets.
no code implementations • 12 Jan 2023 • Xavier Puig, Tianmin Shu, Joshua B. Tenenbaum, Antonio Torralba
Experiments show that our helper agent robustly updates its goal inference and adapts its helping plans to the changing level of uncertainty.
1 code implementation • ICCV 2023 • Yifei Zhou, Zilu Li, Abhinav Shrivastava, Hengshuang Zhao, Antonio Torralba, Taipeng Tian, Ser-Nam Lim
In this way, the new representation can be directly compared with the old representation, in principle avoiding the need for any backfilling.
no code implementations • 22 Dec 2022 • Adrián Rodríguez-Muñoz, Antonio Torralba
In this work, we investigate the hypothesis that the existence of adversarial perturbations is due in part to aliasing in neural networks.
1 code implementation • 29 Nov 2022 • Manel Baradad, Chun-Fu Chen, Jonas Wulff, Tongzhou Wang, Rogerio Feris, Antonio Torralba, Phillip Isola
Learning image representations using synthetic data allows training neural networks without some of the concerns associated with real images, such as privacy and bias.
1 code implementation • 8 Nov 2022 • Yifei Zhou, Zilu Li, Abhinav Shrivastava, Hengshuang Zhao, Antonio Torralba, Taipeng Tian, Ser-Nam Lim
In this way, the new representation can be directly compared with the old representation, in principle avoiding the need for any backfilling.
no code implementations • 20 Oct 2022 • Shuang Li, Yilun Du, Joshua B. Tenenbaum, Antonio Torralba, Igor Mordatch
Such closed-loop communication enables models to correct errors caused by other models, significantly boosting performance on downstream tasks, e. g. improving accuracy on grade school math problems by 7. 5%, without requiring any model finetuning.
Ranked #1 on
Video Question Answering
on ActivityNet-QA
no code implementations • 26 Sep 2022 • Jingwei Ma, Lucy Chai, Minyoung Huh, Tongzhou Wang, Ser-Nam Lim, Phillip Isola, Antonio Torralba
We introduce a new approach to image forensics: placing physical refractive objects, which we call totems, into a scene so as to protect any photograph taken of that scene.
no code implementations • CVPR 2022 • Chuang Gan, Yi Gu, Siyuan Zhou, Jeremy Schwartz, Seth Alter, James Traer, Dan Gutfreund, Joshua B. Tenenbaum, Josh Mcdermott, Antonio Torralba
The way an object looks and sounds provide complementary reflections of its physical properties.
2 code implementations • 6 Jul 2022 • Audrey Cui, Ali Jahanian, Agata Lapedriza, Antonio Torralba, Shahin Mahdizadehaghdam, Rohit Kumar, David Bau
We introduce the task of local relighting, which changes a photograph of a scene by switching on and off the light sources that are visible within the image.
1 code implementation • 30 Jun 2022 • Tongzhou Wang, Simon S. Du, Antonio Torralba, Phillip Isola, Amy Zhang, Yuandong Tian
The ability to separate signal from noise, and reason with clean abstractions, is critical to intelligence.
no code implementations • CVPR 2022 • Wei-Chiu Ma, Anqi Joyce Yang, Shenlong Wang, Raquel Urtasun, Antonio Torralba
Similar to classic correspondences, VCs conform with epipolar geometry; unlike classic correspondences, VCs do not need to be co-visible across views.
no code implementations • CVPR 2022 • Joanna Materzynska, Antonio Torralba, David Bau
The CLIP network measures the similarity between natural text and images; in this work, we investigate the entanglement of the representation of word images and natural images in its image encoder.
no code implementations • CVPR 2022 • Seung Wook Kim, Karsten Kreis, Daiqing Li, Antonio Torralba, Sanja Fidler
Modern image generative models show remarkable sample quality when trained on a single domain or class of objects.
Generative Adversarial Network
Image-to-Image Translation
+1
1 code implementation • 3 Jun 2022 • Nan Liu, Shuang Li, Yilun Du, Antonio Torralba, Joshua B. Tenenbaum
Large text-guided diffusion models, such as DALLE-2, are able to generate stunning photorealistic images given natural language descriptions.
no code implementations • CVPR 2022 • Yining Hong, Kaichun Mo, Li Yi, Leonidas J. Guibas, Antonio Torralba, Joshua B. Tenenbaum, Chuang Gan
Specifically, FixNet consists of a perception module to extract the structured representation from the 3D point cloud, a physical dynamics prediction module to simulate the results of interactions on 3D objects, and a functionality prediction module to evaluate the functionality and choose the correct fix.
no code implementations • ICLR 2022 • Zhenfang Chen, Kexin Yi, Yunzhu Li, Mingyu Ding, Antonio Torralba, Joshua B. Tenenbaum, Chuang Gan
In this paper, we take an initial step to highlight the importance of inferring the hidden physical properties not directly observable from visual appearances, by introducing the Compositional Physical Reasoning (ComPhy) dataset.
no code implementations • 11 Apr 2022 • Pratyusha Sharma, Balakumar Sundaralingam, Valts Blukis, Chris Paxton, Tucker Hermans, Antonio Torralba, Jacob Andreas, Dieter Fox
In this paper, we explore natural language as an expressive and flexible tool for robot correction.
1 code implementation • 4 Apr 2022 • Andrew Luo, Yilun Du, Michael J. Tarr, Joshua B. Tenenbaum, Antonio Torralba, Chuang Gan
By modeling acoustic propagation in a scene as a linear time-invariant system, NAFs learn to continuously map all emitter and listener location pairs to a neural impulse response function that can then be applied to arbitrary sounds.
no code implementations • CVPR 2022 • Dim P. Papadopoulos, Enrique Mora, Nadiia Chepurko, Kuan Wei Huang, Ferda Ofli, Antonio Torralba
To validate our idea, we crowdsource programs for cooking recipes and show that: (a) projecting the image-recipe embeddings into programs leads to better cross-modal retrieval results; (b) generating programs from images leads to better recognition results compared to predicting raw cooking instructions; and (c) we can generate food images by manipulating programs via optimizing the latent code of a GAN.
6 code implementations • CVPR 2022 • George Cazenavette, Tongzhou Wang, Antonio Torralba, Alexei A. Efros, Jun-Yan Zhu
To efficiently obtain the initial and target network parameters for large-scale datasets, we pre-compute and store training trajectories of expert networks trained on the real dataset.
Ranked #3 on
Dataset Distillation - 1IPC
on CUB-200-2011
1 code implementation • 3 Feb 2022 • Shuang Li, Xavier Puig, Chris Paxton, Yilun Du, Clinton Wang, Linxi Fan, Tao Chen, De-An Huang, Ekin Akyürek, Anima Anandkumar, Jacob Andreas, Igor Mordatch, Antonio Torralba, Yuke Zhu
Together, these results suggest that language modeling induces representations that are useful for modeling not just language, but also goals and plans; these representations can aid learning and generalization even outside of language processing.
2 code implementations • 26 Jan 2022 • Evan Hernandez, Sarah Schwettmann, David Bau, Teona Bagashvili, Antonio Torralba, Jacob Andreas
Given a neuron, MILAN generates a description by searching for a natural language string that maximizes pointwise mutual information with the image regions in which the neuron is active.
1 code implementation • CVPR 2022 • Ching-Yao Chuang, R Devon Hjelm, Xin Wang, Vibhav Vineet, Neel Joshi, Antonio Torralba, Stefanie Jegelka, Yale Song
Contrastive learning relies on an assumption that positive pairs contain related views, e. g., patches of an image or co-occurring multimodal signals of a video, that share certain underlying information about an instance.
no code implementations • CVPR 2022 • Daiqing Li, Huan Ling, Seung Wook Kim, Karsten Kreis, Adela Barriuso, Sanja Fidler, Antonio Torralba
By training an effective feature segmentation architecture on top of BigGAN, we turn BigGAN into a labeled dataset generator.
1 code implementation • 11 Jan 2022 • Ethan Weber, Dim P. Papadopoulos, Agata Lapedriza, Ferda Ofli, Muhammad Imran, Antonio Torralba
In this work, we present the Incidents1M Dataset, a large-scale multi-label dataset which contains 977, 088 images, with 43 incident and 49 place categories.
1 code implementation • CVPR 2022 • William Peebles, Jun-Yan Zhu, Richard Zhang, Antonio Torralba, Alexei A. Efros, Eli Shechtman
We propose GAN-Supervised Learning, a framework for learning discriminative models and their GAN-generated training data jointly end-to-end.
no code implementations • NeurIPS 2021 • Yining Hong, Li Yi, Joshua B. Tenenbaum, Antonio Torralba, Chuang Gan
A critical aspect of human visual perception is the ability to parse visual scenes into individual objects and further into object parts, forming part-whole hierarchies.
1 code implementation • NeurIPS 2021 • Shibani Santurkar, Dimitris Tsipras, Mahalaxmi Elango, David Bau, Antonio Torralba, Aleksander Madry
We present a methodology for modifying the behavior of a classifier by directly rewriting its prediction rules.
no code implementations • NeurIPS 2021 • Nan Liu, Shuang Li, Yilun Du, Joshua B. Tenenbaum, Antonio Torralba
The visual world around us can be described as a structured set of objects and their associated relations.
1 code implementation • NeurIPS 2021 • Huan Ling, Karsten Kreis, Daiqing Li, Seung Wook Kim, Antonio Torralba, Sanja Fidler
EditGAN builds on a GAN framework that jointly models images and their semantic segmentations, requiring only a handful of labeled examples, making it a scalable tool for editing.
8 code implementations • CVPR 2022 • Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei HUANG, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Christian Fuegen, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik
We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite.
1 code implementation • 13 Oct 2021 • Chuang Gan, Abhishek Bhandwaldar, Antonio Torralba, Joshua B. Tenenbaum, Phillip Isola
We test several existing RL-based exploration methods on this benchmark and find that an agent using unsupervised contrastive learning for representation learning, and impact-driven learning for exploration, achieved the best results.
1 code implementation • ICCV 2021 • Sarah Schwettmann, Evan Hernandez, David Bau, Samuel Klein, Jacob Andreas, Antonio Torralba
A large body of recent work has identified transformations in the latent spaces of generative adversarial networks (GANs) that consistently and interpretably transform generated images.
1 code implementation • ICCV 2021 • Shuang Li, Yilun Du, Antonio Torralba, Josef Sivic, Bryan Russell
Our task poses unique challenges as a system does not know what types of human-object interactions are present in a video or the actual spatiotemporal location of the human and the object.
no code implementations • ICCV 2021 • Dim P. Papadopoulos, Ethan Weber, Antonio Torralba
Through a large-scale experiment to populate 1M unlabeled images with object segmentation masks for 80 object classes, we show that (1) we obtain 1M object segmentation masks with an total annotation time of only 290 hours; (2) we reduce annotation time by 76x compared to manual annotation; (3) the segmentation quality of our masks is on par with those from manually annotated datasets.
no code implementations • ACL 2022 • Pratyusha Sharma, Antonio Torralba, Jacob Andreas
We evaluate this approach in the ALFRED household simulation environment, providing natural language annotations for only 10% of demonstrations.
no code implementations • 29 Sep 2021 • Shuang Li, Xavier Puig, Yilun Du, Ekin Akyürek, Antonio Torralba, Jacob Andreas, Igor Mordatch
Additional experiments explore the role of language-based encodings in these results; we find that it is possible to train a simple adapter layer that maps from observations and action histories to LM embeddings, and thus that language modeling provides an effective initializer even for tasks with no language as input or output.
no code implementations • ICLR 2022 • Evan Hernandez, Sarah Schwettmann, David Bau, Teona Bagashvili, Antonio Torralba, Jacob Andreas
Given a neuron, MILAN generates a description by searching for a natural language string that maximizes pointwise mutual information with the image regions in which the neuron is active.
no code implementations • International Conference on Intelligent Robots and Systems (IROS) 2021 • Qiang Zhang, Yunzhu Li, Yiyue Luo, Wan Shou, Michael Foshey, Junchi Yan, Joshua B. Tenenbaum, Wojciech Matusik, Antonio Torralba
This work takes a step on dynamics modeling in hand-object interactions from dense tactile sensing, which opens the door for future applications in activity learning, human-computer interactions, and imitation learning for robotics.
no code implementations • 9 Sep 2021 • Qiang Zhang, Yunzhu Li, Yiyue Luo, Wan Shou, Michael Foshey, Junchi Yan, Joshua B. Tenenbaum, Wojciech Matusik, Antonio Torralba
This work takes a step on dynamics modeling in hand-object interactions from dense tactile sensing, which opens the door for future applications in activity learning, human-computer interactions, and imitation learning for robotics.
no code implementations • ICCV 2021 • Prafull Sharma, Miika Aittala, Yoav Y. Schechner, Antonio Torralba, Gregory W. Wornell, William T. Freeman, Fredo Durand
We present a passive non-line-of-sight method that infers the number of people or activity of a person from the observation of a blank wall in an unknown room.
no code implementations • 8 Jul 2021 • Yunzhu Li, Shuang Li, Vincent Sitzmann, Pulkit Agrawal, Antonio Torralba
Humans have a strong intuitive understanding of the 3D environment around us.
no code implementations • CVPR 2021 • Yiyue Luo, Yunzhu Li, Michael Foshey, Wan Shou, Pratyusha Sharma, Tomas Palacios, Antonio Torralba, Wojciech Matusik
In this work, leveraging such tactile interactions, we propose a 3D human pose estimation approach using the pressure maps recorded by a tactile carpet as input.
1 code implementation • NeurIPS 2021 • Manel Baradad, Jonas Wulff, Tongzhou Wang, Phillip Isola, Antonio Torralba
We investigate a suite of image generation models that produce images from simple random processes.
1 code implementation • NeurIPS 2021 • Ching-Yao Chuang, Youssef Mroueh, Kristjan Greenewald, Antonio Torralba, Stefanie Jegelka
Understanding the generalization of deep neural networks is one of the most important tasks in deep learning.
no code implementations • CVPR 2021 • Seung Wook Kim, Jonah Philion, Antonio Torralba, Sanja Fidler
Realistic simulators are critical for training and verifying robotics systems.
no code implementations • 17 Apr 2021 • Jacob Andreas, Gašper Beguš, Michael M. Bronstein, Roee Diamant, Denley Delaney, Shane Gero, Shafi Goldwasser, David F. Gruber, Sarah de Haas, Peter Malkin, Roger Payne, Giovanni Petri, Daniela Rus, Pratyusha Sharma, Dan Tchernov, Pernille Tønnesen, Antonio Torralba, Daniel Vogt, Robert J. Wood
We posit that machine learning will be the cornerstone of future collection, processing, and analysis of multimodal streams of data in animal communication studies, including bioacoustic, behavioral, biological, and environmental data.
4 code implementations • ICCV 2021 • Chen-Hsuan Lin, Wei-Chiu Ma, Antonio Torralba, Simon Lucey
In this paper, we propose Bundle-Adjusting Neural Radiance Fields (BARF) for training NeRF from imperfect (or even unknown) camera poses -- the joint problem of learning neural 3D representations and registering camera frames.
2 code implementations • CVPR 2021 • Yuxuan Zhang, Huan Ling, Jun Gao, Kangxue Yin, Jean-Francois Lafleche, Adela Barriuso, Antonio Torralba, Sanja Fidler
To showcase the power of our approach, we generated datasets for 7 image segmentation tasks which include pixel-level labels for 34 human face parts, and 32 car parts.
no code implementations • CVPR 2021 • Daiqing Li, Junlin Yang, Karsten Kreis, Antonio Torralba, Sanja Fidler
Training deep networks with limited labeled data while achieving a strong generalization ability is key in the quest to reduce human annotation efforts.
1 code implementation • 25 Mar 2021 • Chuang Gan, Siyuan Zhou, Jeremy Schwartz, Seth Alter, Abhishek Bhandwaldar, Dan Gutfreund, Daniel L. K. Yamins, James J DiCarlo, Josh Mcdermott, Antonio Torralba, Joshua B. Tenenbaum
To complete the task, an embodied agent must plan a sequence of actions to change the state of a large number of objects in the face of realistic physical constraints.
no code implementations • 19 Mar 2021 • Alex Andonian, Sabrina Osmany, Audrey Cui, YeonHwan Park, Ali Jahanian, Antonio Torralba, David Bau
We investigate the problem of zero-shot semantic image painting.
no code implementations • ECCV 2020 • Wei-Chiu Ma, Shenlong Wang, Jiayuan Gu, Sivabalan Manivasagam, Antonio Torralba, Raquel Urtasun
Specifically, at each iteration, the neural network takes the feedback as input and outputs an update on the current estimation.
1 code implementation • ICLR 2021 • Xavier Puig, Tianmin Shu, Shuang Li, Zilin Wang, Yuan-Hong Liao, Joshua B. Tenenbaum, Sanja Fidler, Antonio Torralba
In this paper, we introduce Watch-And-Help (WAH), a challenge for testing social intelligence in agents.
no code implementations • ICLR 2021 • Yuxuan Zhang, Wenzheng Chen, Huan Ling, Jun Gao, Yinan Zhang, Antonio Torralba, Sanja Fidler
Key to our approach is to exploit GANs as a multi-view data generator to train an inverse graphics network using an off-the-shelf differentiable renderer, and the trained inverse graphics network as a teacher to disentangle the GAN's latent code into interpretable 3D properties.
no code implementations • 17 Oct 2020 • Yunchao Wei, Shuai Zheng, Ming-Ming Cheng, Hang Zhao, LiWei Wang, Errui Ding, Yi Yang, Antonio Torralba, Ting Liu, Guolei Sun, Wenguan Wang, Luc van Gool, Wonho Bae, Junhyug Noh, Jinhwan Seo, Gunhee Kim, Hao Zhao, Ming Lu, Anbang Yao, Yiwen Guo, Yurong Chen, Li Zhang, Chuangchuang Tan, Tao Ruan, Guanghua Gu, Shikui Wei, Yao Zhao, Mariia Dobko, Ostap Viniavskyi, Oles Dobosevych, Zhendong Wang, Zhenyuan Chen, Chen Gong, Huanqing Yan, Jun He
The purpose of the Learning from Imperfect Data (LID) workshop is to inspire and facilitate the research in developing novel approaches that would harness the imperfect data and improve the data-efficiency during training.
no code implementations • 14 Sep 2020 • Jonas Wulff, Antonio Torralba
We show that, under a simple nonlinear operation, the data distribution can be modeled as Gaussian and therefore expressed using sufficient statistics.
2 code implementations • 10 Sep 2020 • David Bau, Jun-Yan Zhu, Hendrik Strobelt, Agata Lapedriza, Bolei Zhou, Antonio Torralba
Second, we use a similar analytic method to analyze a generative adversarial network (GAN) model trained to generate scenes.
1 code implementation • ECCV 2020 • William Peebles, John Peebles, Jun-Yan Zhu, Alexei Efros, Antonio Torralba
In this paper, we propose the Hessian Penalty, a simple regularization term that encourages the Hessian of a generative model with respect to its input to be diagonal.
1 code implementation • ECCV 2020 • Ethan Weber, Nuria Marzo, Dim P. Papadopoulos, Aritro Biswas, Agata Lapedriza, Ferda Ofli, Muhammad Imran, Antonio Torralba
While most studies on social media are limited to text, images offer more information for understanding disaster and incident scenes.
3 code implementations • ECCV 2020 • David Bau, Steven Liu, Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba
To address the problem, we propose a formulation in which the desired rule is changed by manipulating a layer of a deep network as a linear associative memory.
no code implementations • 27 Jul 2020 • Chuang Gan, Xiaoyu Chen, Phillip Isola, Antonio Torralba, Joshua B. Tenenbaum
Humans integrate multiple sensory modalities (e. g. visual and audio) to build a causal understanding of the physical world.
no code implementations • ECCV 2020 • Chuang Gan, Deng Huang, Peihao Chen, Joshua B. Tenenbaum, Antonio Torralba
In this paper, we introduce Foley Music, a system that can synthesize plausible music for a silent video clip about people playing musical instruments.
1 code implementation • 9 Jul 2020 • Chuang Gan, Jeremy Schwartz, Seth Alter, Damian Mrowca, Martin Schrimpf, James Traer, Julian De Freitas, Jonas Kubilius, Abhishek Bhandwaldar, Nick Haber, Megumi Sano, Kuno Kim, Elias Wang, Michael Lingelbach, Aidan Curtis, Kevin Feigelis, Daniel M. Bear, Dan Gutfreund, David Cox, Antonio Torralba, James J. DiCarlo, Joshua B. Tenenbaum, Josh H. McDermott, Daniel L. K. Yamins
We introduce ThreeDWorld (TDW), a platform for interactive multi-modal physical simulation.
1 code implementation • 6 Jul 2020 • Ching-Yao Chuang, Antonio Torralba, Stefanie Jegelka
When machine learning models are deployed on a test distribution different from the training distribution, they can perform poorly, but overestimate their performance.
1 code implementation • NeurIPS 2020 • Ching-Yao Chuang, Joshua Robinson, Lin Yen-Chen, Antonio Torralba, Stefanie Jegelka
A prominent technique for self-supervised representation learning has been to contrast semantically similar and dissimilar pairs of samples.
1 code implementation • NeurIPS 2020 • Yunzhu Li, Antonio Torralba, Animashree Anandkumar, Dieter Fox, Animesh Garg
We assume access to different configurations and environmental conditions, i. e., data from unknown interventions on the underlying system; thus, we can hope to discover the correct underlying causal graph without explicit interventions.
2 code implementations • CVPR 2020 • Steven Liu, Tongzhou Wang, David Bau, Jun-Yan Zhu, Antonio Torralba
We introduce a simple but effective unsupervised method for generating realistic and diverse images.
1 code implementation • 16 Jun 2020 • Andrew Rouditchenko, Angie Boggust, David Harwath, Brian Chen, Dhiraj Joshi, Samuel Thomas, Kartik Audhkhasi, Hilde Kuehne, Rameswar Panda, Rogerio Feris, Brian Kingsbury, Michael Picheny, Antonio Torralba, James Glass
Further, we propose a tri-modal model that jointly processes raw audio, video, and text captions from videos to learn a multi-modal semantic embedding space useful for text-video retrieval.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
no code implementations • CVPR 2020 • Seung Wook Kim, Yuhao Zhou, Jonah Philion, Antonio Torralba, Sanja Fidler
Simulation is a crucial component of any robotic system.
1 code implementation • 15 May 2020 • David Bau, Hendrik Strobelt, William Peebles, Jonas Wulff, Bolei Zhou, Jun-Yan Zhu, Antonio Torralba
First, it is hard for GANs to precisely reproduce an input image.
no code implementations • ICLR 2020 • Zhoutong Zhang, Yunyun Wang, Chuang Gan, Jiajun Wu, Joshua B. Tenenbaum, Antonio Torralba, William T. Freeman
We show that networks using Harmonic Convolution can reliably model audio priors and achieve high performance in unsupervised audio restoration tasks.
1 code implementation • ICML 2020 • Yunzhu Li, Toru Lin, Kexin Yi, Daniel M. Bear, Daniel L. K. Yamins, Jiajun Wu, Joshua B. Tenenbaum, Antonio Torralba
The abilities to perform physical reasoning and to adapt to new environments, while intrinsic to humans, remain challenging to state-of-the-art computational models.
no code implementations • CVPR 2020 • Chuang Gan, Deng Huang, Hang Zhao, Joshua B. Tenenbaum, Antonio Torralba
Recent deep learning approaches have achieved impressive performance on visual sound separation tasks.
no code implementations • ICCV 2019 • Chuang Gan, Hang Zhao, Peihao Chen, David Cox, Antonio Torralba
At test time, the stereo-sound student network can work independently to perform object localization us-ing just stereo audio and camera meta-data, without any visual input.
1 code implementation • ICCV 2019 • David Bau, Jun-Yan Zhu, Jonas Wulff, William Peebles, Hendrik Strobelt, Bolei Zhou, Antonio Torralba
Differences in statistics reveal object classes that are omitted by a GAN.
2 code implementations • ICCV 2019 • Petr Kellnhofer, Adria Recasens, Simon Stent, Wojciech Matusik, Antonio Torralba
Finally, we demonstrate an application of our model for estimating customer attention in a supermarket setting.
Ranked #4 on
Gaze Estimation
on Gaze360
no code implementations • ICLR 2020 • Yunzhu Li, Hao He, Jiajun Wu, Dina Katabi, Antonio Torralba
Finding an embedding space for a linear approximation of a nonlinear dynamical system enables efficient system identification and control synthesis.
1 code implementation • 13 Oct 2019 • Ching-Yao Chuang, Antonio Torralba, Stefanie Jegelka
In this work, we study, theoretically and empirically, the effect of the embedding complexity on generalization to the target domain.
no code implementations • ICCV 2019 • Hang Chu, Daiqing Li, David Acuna, Amlan Kar, Maria Shugrina, Xinkai Wei, Ming-Yu Liu, Antonio Torralba, Sanja Fidler
We propose Neural Turtle Graphics (NTG), a novel generative model for spatial graphs, and demonstrate its applications in modeling city road layouts.
3 code implementations • ICLR 2020 • Kexin Yi, Chuang Gan, Yunzhu Li, Pushmeet Kohli, Jiajun Wu, Antonio Torralba, Joshua B. Tenenbaum
While these models thrive on the perception-based task (descriptive), they perform poorly on the causal tasks (explanatory, predictive and counterfactual), suggesting that a principled approach for causal reasoning should incorporate the capability of both perceiving complex visual and language inputs, and understanding the underlying dynamics and causal relations.
1 code implementation • CVPR 2019 • Yunzhu Li, Jun-Yan Zhu, Russ Tedrake, Antonio Torralba
To connect vision and touch, we introduce new tasks of synthesizing plausible tactile signals from visual inputs as well as imagining how we interact with objects given tactile data as input.
no code implementations • CVPR 2019 • Dim P. Papadopoulos, Youssef Tamaazousti, Ferda Ofli, Ingmar Weber, Antonio Torralba
From a visual perspective, every instruction step can be seen as a way to change the visual appearance of the dish by adding extra objects (e. g., adding an ingredient) or changing the appearance of the existing ones (e. g., cooking the dish).
no code implementations • ICCV 2019 • Amlan Kar, Aayush Prakash, Ming-Yu Liu, Eric Cameracci, Justin Yuan, Matt Rusiniak, David Acuna, Antonio Torralba, Sanja Fidler
Training models to high-end performance requires availability of large labeled datasets, which are expensive to get.
no code implementations • 18 Apr 2019 • Andrew Rouditchenko, Hang Zhao, Chuang Gan, Josh Mcdermott, Antonio Torralba
Segmenting objects in images and separating sound sources in audio are challenging tasks, in part because traditional approaches require large amounts of labeled data.
1 code implementation • ICCV 2019 • Hang Zhao, Chuang Gan, Wei-Chiu Ma, Antonio Torralba
Sounds originate from object motions and vibrations of surrounding air.
no code implementations • ICLR Workshop DeepGenStruct 2019 • David Bau, Jun-Yan Zhu, Hendrik Strobelt, Bolei Zhou, Joshua B. Tenenbaum, William T. Freeman, Antonio Torralba
We present an analytic framework to visualize and understand GANs at the unit-, object-, and scene-level.
no code implementations • 29 Jan 2019 • David Bau, Jun-Yan Zhu, Hendrik Strobelt, Bolei Zhou, Joshua B. Tenenbaum, William T. Freeman, Antonio Torralba
We quantify the causal effect of interpretable units by measuring the ability of interventions to control objects in the output.
1 code implementation • NeurIPS 2018 • Jun-Yan Zhu, Zhoutong Zhang, Chengkai Zhang, Jiajun Wu, Antonio Torralba, Joshua B. Tenenbaum, William T. Freeman
Our model first learns to synthesize 3D shapes that are indistinguishable from real shapes.
1 code implementation • NeurIPS 2018 • Jun-Yan Zhu, Zhoutong Zhang, Chengkai Zhang, Jiajun Wu, Antonio Torralba, Josh Tenenbaum, Bill Freeman
The VON not only generates images that are more realistic than the state-of-the-art 2D image synthesis methods but also enables many 3D operations such as changing the viewpoint of a generated image, shape and texture editing, linear interpolation in texture and shape space, and transferring appearance across different objects and viewpoints.
5 code implementations • 27 Nov 2018 • Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba, Alexei A. Efros
Model distillation aims to distill the knowledge of a complex model into a simpler one.
8 code implementations • ICLR 2019 • David Bau, Jun-Yan Zhu, Hendrik Strobelt, Bolei Zhou, Joshua B. Tenenbaum, William T. Freeman, Antonio Torralba
Then, we quantify the causal effect of interpretable units by measuring the ability of interventions to control objects in the output.
no code implementations • 14 Oct 2018 • Javier Marin, Aritro Biswas, Ferda Ofli, Nicholas Hynes, Amaia Salvador, Yusuf Aytar, Ingmar Weber, Antonio Torralba
In this paper, we introduce Recipe1M+, a new large-scale, structured corpus of over one million cooking recipes and 13 million food images.
Ranked #2 on
Cross-Modal Retrieval
on Recipe1M+
2 code implementations • NeurIPS 2018 • Kexin Yi, Jiajun Wu, Chuang Gan, Antonio Torralba, Pushmeet Kohli, Joshua B. Tenenbaum
Second, the model is more data- and memory-efficient: it performs well after learning on a small number of training data; it can also encode an image into a compact representation, requiring less storage than existing methods for offline question answering.
Ranked #1 on
Visual Question Answering (VQA)
on CLEVR
no code implementations • ICLR 2019 • Yunzhu Li, Jiajun Wu, Russ Tedrake, Joshua B. Tenenbaum, Antonio Torralba
In this paper, we propose to learn a particle-based simulator for complex control tasks.
1 code implementation • 28 Sep 2018 • Yunzhu Li, Jiajun Wu, Jun-Yan Zhu, Joshua B. Tenenbaum, Antonio Torralba, Russ Tedrake
There has been an increasing interest in learning dynamics simulators for model-based control.
1 code implementation • ECCV 2018 • Adrià Recasens, Petr Kellnhofer, Simon Stent, Wojciech Matusik, Antonio Torralba
We introduce a saliency-based distortion layer for convolutional neural networks that helps to improve the spatial sampling of input data for a given task.
no code implementations • ECCV 2018 • Wei-Chiu Ma, Hang Chu, Bolei Zhou, Raquel Urtasun, Antonio Torralba
At inference time, our model can be easily reduced to a single stream module that performs intrinsic decomposition on a single input image.
1 code implementation • ECCV 2018 • Bolei Zhou, Yiyou Sun, David Bau, Antonio Torralba
Explanations of the decisions made by a deep neural network are important for human end-users to be able to understand and diagnose the trustworthiness of the system.
1 code implementation • NeurIPS 2018 • Shunyu Yao, Tzu Ming Harry Hsu, Jun-Yan Zhu, Jiajun Wu, Antonio Torralba, William T. Freeman, Joshua B. Tenenbaum
In this work, we propose 3D scene de-rendering networks (3D-SDN) to address the above issues by integrating disentangled representations for semantics, geometry, and appearance into a deep generative model.
no code implementations • SIGCOMM '18 Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication 2018 • Ming-Min Zhao, Yonglong Tian, Hang Zhao, Mohammad Abu Alsheikh, Tianhong Li, Rumen Hristov, Zachary Kabelac, Dina Katabi, Antonio Torralba
It maintains this accuracy even in the presence of multiple people, and in new environments that it has not seen in the training set.
2 code implementations • 3 Aug 2018 • Jimmy Wu, Bolei Zhou, Rebecca Russell, Vincent Kee, Syler Wagner, Mitchell Hebert, Antonio Torralba, David M. S. Johnson
In this work, we introduce pose interpreter networks for 6-DoF object pose estimation.
4 code implementations • CVPR 2018 • Xavier Puig, Kevin Ra, Marko Boben, Jiaman Li, Tingwu Wang, Sanja Fidler, Antonio Torralba
We then implement the most common atomic (inter)actions in the Unity3D game engine, and use our programs to "drive" an artificial agent to execute tasks in a simulated household environment.
no code implementations • 7 Jun 2018 • Bolei Zhou, Yiyou Sun, David Bau, Antonio Torralba
We confirm that unit attributes such as class selectivity are a poor predictor for impact on overall accuracy as found previously in recent work \cite{morcos2018importance}.
no code implementations • CVPR 2018 • Ming-Min Zhao, Tianhong Li, Mohammad Abu Alsheikh, Yonglong Tian, Hang Zhao, Antonio Torralba, Dina Katabi
Yet, unlike vision-based pose estimation, the radio-based system can estimate 2D poses through walls despite never trained on such scenarios.
1 code implementation • CVPR 2018 • Manel Baradad, Vickie Ye, Adam B. Yedidia, Frédo Durand, William T. Freeman, Gregory W. Wornell, Antonio Torralba
We present a method for inferring a 4D light field of a hidden scene from 2D shadows cast by a known occluder on a diffuse wall.
2 code implementations • ECCV 2018 • Hang Zhao, Chuang Gan, Andrew Rouditchenko, Carl Vondrick, Josh Mcdermott, Antonio Torralba
We introduce PixelPlayer, a system that, by leveraging large amounts of unlabeled videos, learns to locate image regions which produce sounds and separate the input sounds into a set of components that represents the sound from each pixel.
no code implementations • ECCV 2018 • David Harwath, Adrià Recasens, Dídac Surís, Galen Chuang, Antonio Torralba, James Glass
In this paper, we explore neural network models that learn to associate segments of spoken audio captions with the semantically relevant portions of natural images that they refer to.
Ranked #2 on
Speech Prompted Semantic Segmentation
on ADE20K
no code implementations • 3 Apr 2018 • Jiajun Wu, Tianfan Xue, Joseph J. Lim, Yuandong Tian, Joshua B. Tenenbaum, Antonio Torralba, William T. Freeman
3D-INN is trained on real images to estimate 2D keypoint heatmaps from an input image; it then predicts 3D object structure from heatmaps using knowledge learned from synthetic 3D shapes.
no code implementations • ICLR 2018 • Deniz Oktay, Carl Vondrick, Antonio Torralba
However, when a layer is removed, the model learns to produce a different image that still looks natural to an adversary, which is possible by removing objects.
2 code implementations • ICCV 2019 • Hang Zhao, Antonio Torralba, Lorenzo Torresani, Zhicheng Yan
This paper presents a new large-scale dataset for recognition and temporal localization of human actions collected from Web videos.
Ranked #12 on
Temporal Action Localization
on HACS
no code implementations • 20 Dec 2017 • Andrew Owens, Jiajun Wu, Josh H. McDermott, William T. Freeman, Antonio Torralba
The sound of crashing waves, the roar of fast-moving cars -- sound conveys important information about the objects in our surroundings.
no code implementations • CVPR 2018 • Ching-Yao Chuang, Jiaman Li, Antonio Torralba, Sanja Fidler
We address the problem of affordance reasoning in diverse scenes that appear in the real world.
5 code implementations • ECCV 2018 • Bolei Zhou, Alex Andonian, Aude Oliva, Antonio Torralba
Temporal relational reasoning, the ability to link meaningful transformations of objects or entities over time, is a fundamental property of intelligent species.
Ranked #2 on
Hand Gesture Recognition
on Jester test
3 code implementations • 15 Nov 2017 • Bolei Zhou, David Bau, Aude Oliva, Antonio Torralba
In this work, we describe Network Dissection, a method that interprets networks by providing labels for the units of their deep visual representations.
no code implementations • ICCV 2017 • Adria Recasens, Carl Vondrick, Aditya Khosla, Antonio Torralba
In this paper, we present an approach for following gaze in video by predicting where a person (in the video) is looking even when the object is in a different frame.
no code implementations • ICCV 2017 • Katherine L. Bouman, Vickie Ye, Adam B. Yedidia, Fredo Durand, Gregory W. Wornell, Antonio Torralba, William T. Freeman
We show that walls and other obstructions with edges can be exploited as naturally-occurring "cameras" that reveal the hidden scenes beyond them.
no code implementations • CVPR 2017 • Amaia Salvador, Nicholas Hynes, Yusuf Aytar, Javier Marin, Ferda Ofli, Ingmar Weber, Antonio Torralba
In this paper, we introduce Recipe1M, a new large-scale, structured corpus of over 1m cooking recipes and 800k food images.
no code implementations • CVPR 2017 • Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, Antonio Torralba
A novel network design called Cascade Segmentation Module is proposed to parse a scene into stuff, objects, and object parts in a cascade and improve over the baselines.
no code implementations • CVPR 2017 • Carl Vondrick, Antonio Torralba
We present a model that generates the future by transforming pixels in the past.
1 code implementation • 3 Jun 2017 • Yusuf Aytar, Carl Vondrick, Antonio Torralba
We capitalize on large amounts of readily-available, synchronous data to learn a deep discriminative representations shared across three major natural modalities: vision, sound and language.
1 code implementation • CVPR 2017 • David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, Antonio Torralba
Given any CNN model, the proposed method draws on a broad data set of visual concepts to score the semantics of hidden units at each intermediate convolutional layer.
no code implementations • ICCV 2017 • Hang Zhao, Xavier Puig, Bolei Zhou, Sanja Fidler, Antonio Torralba
Recognizing arbitrary objects in the wild has been a challenging problem due to the limitations of existing classification models and datasets.
no code implementations • 9 Mar 2017 • Enes Kocabey, Mustafa Camurcu, Ferda Ofli, Yusuf Aytar, Javier Marin, Antonio Torralba, Ingmar Weber
A person's weight status can have profound implications on their life, ranging from mental health, to longevity, to financial income.
2 code implementations • 5 Mar 2017 • Jay M. Wong, Vincent Kee, Tiffany Le, Syler Wagner, Gian-Luca Mariottini, Abraham Schneider, Lei Hamilton, Rahul Chipalkatty, Mitchell Hebert, David M. S. Johnson, Jimmy Wu, Bolei Zhou, Antonio Torralba
Recent robotic manipulation competitions have highlighted that sophisticated robots still struggle to achieve fast and reliable perception of task-relevant objects in complex, realistic scenarios.
no code implementations • 21 Feb 2017 • Ferda Ofli, Yusuf Aytar, Ingmar Weber, Raggi al Hammouri, Antonio Torralba
Studying how food is perceived in relation to what it actually is typically involves a laboratory setup.
no code implementations • 9 Dec 2016 • Adrià Recasens, Carl Vondrick, Aditya Khosla, Antonio Torralba
In this paper, we present an approach for following gaze across views by predicting where a particular person is looking throughout a scene.
no code implementations • 4 Dec 2016 • Benjamin Eysenbach, Carl Vondrick, Antonio Torralba
We then create a representation of characters' beliefs for two tasks in human action understanding: predicting who is mistaken, and when they are mistaken.
1 code implementation • 1 Dec 2016 • Michael B. Chang, Tomer Ullman, Antonio Torralba, Joshua B. Tenenbaum
By comparing to less structured architectures, we show that the NPE's compositional representation of the structure in physical interactions improves its ability to predict movement, generalize across variable object count and different scene configurations, and infer latent properties of objects such as mass.
no code implementations • NeurIPS 2016 • David Harwath, Antonio Torralba, James Glass
Humans learn to speak before they can read or write, so why can’t computers do the same?
6 code implementations • NeurIPS 2016 • Yusuf Aytar, Carl Vondrick, Antonio Torralba
We learn rich natural sound representations by capitalizing on large amounts of unlabeled sound data collected in the wild.
no code implementations • 27 Oct 2016 • Yusuf Aytar, Lluis Castrejon, Carl Vondrick, Hamed Pirsiavash, Antonio Torralba
Our experiments suggest that our scene representation can help transfer representations across modalities for retrieval.
no code implementations • 6 Oct 2016 • Bolei Zhou, Aditya Khosla, Agata Lapedriza, Antonio Torralba, Aude Oliva
The rise of multi-million-item dataset initiatives has enabled data-hungry machine learning algorithms to reach near-human semantic classification at tasks such as object and scene recognition.
no code implementations • NeurIPS 2016 • Carl Vondrick, Hamed Pirsiavash, Antonio Torralba
We capitalize on large amounts of unlabeled video in order to learn a model of scene dynamics for both video recognition tasks (e. g. action classification) and video generation tasks (e. g. future prediction).
1 code implementation • 25 Aug 2016 • Andrew Owens, Jiajun Wu, Josh H. McDermott, William T. Freeman, Antonio Torralba
We show that, through this process, the network learns a representation that conveys information about objects and scenes.
22 code implementations • 18 Aug 2016 • Bolei Zhou, Hang Zhao, Xavier Puig, Tete Xiao, Sanja Fidler, Adela Barriuso, Antonio Torralba
Scene parsing, or recognizing and segmenting objects and stuff in an image, is one of the key problems in computer vision.
no code implementations • CVPR 2016 • Lluis Castrejon, Yusuf Aytar, Carl Vondrick, Hamed Pirsiavash, Antonio Torralba
Our experiments suggest that our scene representation can help transfer representations across modalities for retrieval.
2 code implementations • CVPR 2016 • Kyle Krafka, Aditya Khosla, Petr Kellnhofer, Harini Kannan, Suchendra Bhandarkar, Wojciech Matusik, Antonio Torralba
We believe that we can put the power of eye tracking in everyone's palm by building eye tracking software that works on commodity hardware such as mobile phones and tablets, without the need for additional sensors or devices.
1 code implementation • 29 Apr 2016 • Jiajun Wu, Tianfan Xue, Joseph J. Lim, Yuandong Tian, Joshua B. Tenenbaum, Antonio Torralba, William T. Freeman
In this work, we propose 3D INterpreter Network (3D-INN), an end-to-end framework which sequentially estimates 2D keypoint heatmaps and 3D object structure, trained on both real 2D-annotated images and synthetic 3D data.
1 code implementation • 12 Apr 2016 • Zoya Bylinskii, Tilke Judd, Aude Oliva, Antonio Torralba, Frédo Durand
How best to evaluate a saliency model's ability to predict where humans look in images is an open research question.
no code implementations • 12 Jan 2016 • Radoslaw M. Cichy, Aditya Khosla, Dimitrios Pantazis, Antonio Torralba, Aude Oliva
The complex multi-stage architecture of cortical visual pathways provides the neural basis for efficient visual object recognition in humans.
no code implementations • CVPR 2016 • Andrew Owens, Phillip Isola, Josh Mcdermott, Antonio Torralba, Edward H. Adelson, William T. Freeman
Objects make distinctive sounds when they are hit or scratched.
35 code implementations • CVPR 2016 • Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, Antonio Torralba
In this work, we revisit the global average pooling layer proposed in [13], and shed light on how it explicitly enables the convolutional neural network to have remarkable localization ability despite being trained on image-level labels.
1 code implementation • CVPR 2016 • Makarand Tapaswi, Yukun Zhu, Rainer Stiefelhagen, Antonio Torralba, Raquel Urtasun, Sanja Fidler
We introduce the MovieQA dataset which aims to evaluate automatic story comprehension from both video and text.
no code implementations • ICCV 2015 • Aditya Khosla, Akhil S. Raju, Antonio Torralba, Aude Oliva
Progress in estimating visual memorability has been limited by the small scale and lack of variety of benchmark data.
no code implementations • NeurIPS 2015 • Adria Recasens, Aditya Khosla, Carl Vondrick, Antonio Torralba
Humans have the remarkable ability to follow the gaze of other people to identify what they are looking at.
16 code implementations • NeurIPS 2015 • Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S. Zemel, Antonio Torralba, Raquel Urtasun, Sanja Fidler
The end result is an off-the-shelf encoder that can produce highly generic sentence representations that are robust and perform well in practice.
Ranked #2 on
Semantic Similarity
on SICK
3 code implementations • ICCV 2015 • Yukun Zhu, Ryan Kiros, Richard Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, Sanja Fidler
Books are a rich source of both fine-grained information, how a character, an object or a scene looks like, as well as high-level semantics, what someone is thinking, feeling and how these states evolve through a story.
no code implementations • CVPR 2016 • Carl Vondrick, Hamed Pirsiavash, Antonio Torralba
The key idea behind our approach is that we can train deep networks to predict the visual representation of images in the future.
1 code implementation • 19 Feb 2015 • Carl Vondrick, Aditya Khosla, Hamed Pirsiavash, Tomasz Malisiewicz, Antonio Torralba
We introduce algorithms to visualize feature spaces used by object detectors.
1 code implementation • 22 Dec 2014 • Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, Antonio Torralba
With the success of new computational architectures for visual processing, such as convolutional neural networks (CNN) and access to image databases with millions of labeled examples (e. g., ImageNet, Places), the state of the art in computer vision is advancing rapidly.
no code implementations • NeurIPS 2014 • Bolei Zhou, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, Aude Oliva
Whereas the tremendous recent progress in object recognition tasks is due to the availability of large datasets like ImageNet and the rise of Convolutional Neural Networks (CNNs) for learning high-level features, performance at scene recognition has not attained the same level of success.
no code implementations • NeurIPS 2015 • Carl Vondrick, Hamed Pirsiavash, Aude Oliva, Antonio Torralba
Although the human visual system can recognize many concepts under challenging conditions, it still has some biases.
no code implementations • CVPR 2016 • Carl Vondrick, Deniz Oktay, Hamed Pirsiavash, Antonio Torralba
In this paper, we introduce the problem of predicting why a person has performed an action in images.
no code implementations • CVPR 2014 • Aditya Khosla, Byoungkwon An An, Joseph J. Lim, Antonio Torralba
In this work, we propose to look beyond the visible elements of a scene; we demonstrate that a scene is not just a collection of objects and their configuration or the labels assigned to its pixels - it is so much more.
no code implementations • 25 Nov 2013 • Agata Lapedriza, Hamed Pirsiavash, Zoya Bylinskii, Antonio Torralba
When learning a new concept, not all training examples may prove equally useful for training: some may have higher or lower training value than others.
no code implementations • 11 Dec 2012 • Carl Vondrick, Aditya Khosla, Tomasz Malisiewicz, Antonio Torralba
By visualizing feature spaces, we can gain a more intuitive understanding of our detection systems.
no code implementations • NeurIPS 2012 • Jianxiong Xiao, Bryan Russell, Antonio Torralba
In this paper we seek to detect rectangular cuboids and localize their corners in uncalibrated single-view images depicting everyday scenes.
no code implementations • NeurIPS 2011 • Antonio Torralba, Joshua B. Tenenbaum, Ruslan R. Salakhutdinov
We introduce HD (or ``Hierarchical-Deep'') models, a new compositional learning architecture that integrates deep learning models with structured hierarchical Bayesian models.
no code implementations • NeurIPS 2011 • Phillip Isola, Devi Parikh, Antonio Torralba, Aude Oliva
Artists, advertisers, and photographers are routinely presented with the task of creating an image that a viewer will remember.
no code implementations • NeurIPS 2009 • Gunhee Kim, Antonio Torralba
This paper proposes a fast and scalable alternating optimization technique to detect regions of interest (ROIs) in cluttered Web images without labels.
no code implementations • NeurIPS 2009 • Rob Fergus, Yair Weiss, Antonio Torralba
With the advent of the Internet it is now possible to collect hundreds of millions of images.