Search Results for author: Jiajun Wu

Found 97 papers, 30 papers with code

Grammar-Based Grounded Lexicon Learning

no code implementations NeurIPS 2021 Jiayuan Mao, Haoyue Shi, Jiajun Wu, Roger Levy, Josh Tenenbaum

We present Grammar-Based Grounded Language Learning (G2L2), a lexicalist approach toward learning a compositional and grounded meaning representation of language from grounded data, such as paired images and texts.

Grounded language learning Network Embedding +1

On the Opportunities and Risks of Foundation Models

1 code implementation16 Aug 2021 Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel, Noah Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Kohd, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, aditi raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, Percy Liang

AI is undergoing a paradigm shift with the rise of models (e. g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks.

Transfer Learning

iGibson 2.0: Object-Centric Simulation for Robot Learning of Everyday Household Tasks

1 code implementation6 Aug 2021 Chengshu Li, Fei Xia, Roberto Martín-Martín, Michael Lingelbach, Sanjana Srivastava, Bokui Shen, Kent Vainio, Cem Gokmen, Gokul Dharan, Tanish Jain, Andrey Kurenkov, C. Karen Liu, Hyowon Gweon, Jiajun Wu, Li Fei-Fei, Silvio Savarese

We evaluate the new capabilities of iGibson 2. 0 to enable robot learning of novel tasks, in the hope of demonstrating the potential of this new simulator to support new research in embodied AI.

Imitation Learning

BEHAVIOR: Benchmark for Everyday Household Activities in Virtual, Interactive, and Ecological Environments

no code implementations6 Aug 2021 Sanjana Srivastava, Chengshu Li, Michael Lingelbach, Roberto Martín-Martín, Fei Xia, Kent Vainio, Zheng Lian, Cem Gokmen, Shyamal Buch, C. Karen Liu, Silvio Savarese, Hyowon Gweon, Jiajun Wu, Li Fei-Fei

We introduce BEHAVIOR, a benchmark for embodied AI with 100 activities in simulation, spanning a range of everyday household chores such as cleaning, maintenance, and food preparation.

SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations

1 code implementation2 Aug 2021 Chenlin Meng, Yutong He, Yang song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, Stefano Ermon

The key challenge is balancing faithfulness to the user input (e. g., hand-drawn colored strokes) and realism of the synthesized image.

Denoising Image Generation

Unsupervised Discovery of Object Radiance Fields

1 code implementation16 Jul 2021 Hong-Xing Yu, Leonidas J. Guibas, Jiajun Wu

We study the problem of inferring an object-centric scene representation from a single image, aiming to derive a representation that explains the image formation process, captures the scene's 3D nature, and is learned without supervision.

Novel View Synthesis Scene Segmentation

Temporal and Object Quantification Networks

no code implementations10 Jun 2021 Jiayuan Mao, Zhezheng Luo, Chuang Gan, Joshua B. Tenenbaum, Jiajun Wu, Leslie Pack Kaelbling, Tomer D. Ullman

We present Temporal and Object Quantification Networks (TOQ-Nets), a new class of neuro-symbolic networks with a structural bias that enables them to learn to recognize complex relational-temporal events.

Hierarchical Motion Understanding via Motion Programs

no code implementations CVPR 2021 Sumith Kulal, Jiayuan Mao, Alex Aiken, Jiajun Wu

We posit that adding higher-level motion primitives, which can capture natural coarser units of motion such as backswing or follow-through, can be used to improve downstream analysis tasks.

Video Editing Video Prediction

KeypointDeformer: Unsupervised 3D Keypoint Discovery for Shape Control

no code implementations CVPR 2021 Tomas Jakab, Richard Tucker, Ameesh Makadia, Jiajun Wu, Noah Snavely, Angjoo Kanazawa

We cast this as the problem of aligning a source 3D object to a target 3D object from the same object category.

De-rendering the World's Revolutionary Artefacts

1 code implementation CVPR 2021 Shangzhe Wu, Ameesh Makadia, Jiajun Wu, Noah Snavely, Richard Tucker, Angjoo Kanazawa

Recent works have shown exciting results in unsupervised image de-rendering -- learning to decompose 3D shape, appearance, and lighting from single-image collections without explicit supervision.

Repopulating Street Scenes

no code implementations CVPR 2021 Yifan Wang, Andrew Liu, Richard Tucker, Jiajun Wu, Brian L. Curless, Steven M. Seitz, Noah Snavely

We present a framework for automatically reconfiguring images of street scenes by populating, depopulating, or repopulating them with objects such as pedestrians or vehicles.

Autonomous Driving

Learning Temporal Dynamics from Cycles in Narrated Video

no code implementations ICCV 2021 Dave Epstein, Jiajun Wu, Cordelia Schmid, Chen Sun

Learning to model how the world changes as time elapses has proven a challenging problem for the computer vision community.

Temporal and Object Quantification Nets

no code implementations1 Jan 2021 Jiayuan Mao, Zhezheng Luo, Chuang Gan, Joshua B. Tenenbaum, Jiajun Wu, Leslie Pack Kaelbling, Tomer Ullman

We aim to learn generalizable representations for complex activities by quantifying over both entities and time, as in “the kicker is behind all the other players,” or “the player controls the ball until it moves toward the goal.” Such a structural inductive bias of object relations, object quantification, and temporal orders will enable the learned representation to generalize to situations with varying numbers of agents, objects, and time courses.

Event Detection

Object-Centric Diagnosis of Visual Reasoning

no code implementations21 Dec 2020 Jianwei Yang, Jiayuan Mao, Jiajun Wu, Devi Parikh, David D. Cox, Joshua B. Tenenbaum, Chuang Gan

In contrast, symbolic and modular models have a relatively better grounding and robustness, though at the cost of accuracy.

Question Answering Visual Question Answering +1

Neural Radiance Flow for 4D View Synthesis and Video Processing

1 code implementation ICCV 2021 Yilun Du, Yinan Zhang, Hong-Xing Yu, Joshua B. Tenenbaum, Jiajun Wu

We present a method, Neural Radiance Flow (NeRFlow), to learn a 4D spatial-temporal representation of a dynamic scene from a set of RGB images.

Image Super-Resolution

Object-Centric Neural Scene Rendering

no code implementations15 Dec 2020 Michelle Guo, Alireza Fathi, Jiajun Wu, Thomas Funkhouser

We present a method for composing photorealistic scenes from captured images of objects.

Multi-Plane Program Induction with 3D Box Priors

no code implementations NeurIPS 2020 Yikai Li, Jiayuan Mao, Xiuming Zhang, William T. Freeman, Joshua B. Tenenbaum, Noah Snavely, Jiajun Wu

We consider two important aspects in understanding and editing images: modeling regular, program-like texture or patterns in 2D planes, and 3D posing of these planes in the scene.

Program induction Program Synthesis

Learning 3D Dynamic Scene Representations for Robot Manipulation

2 code implementations3 Nov 2020 Zhenjia Xu, Zhanpeng He, Jiajun Wu, Shuran Song

3D scene representation for robot manipulation should capture three key object properties: permanency -- objects that become occluded over time continue to exist; amodal completeness -- objects have 3D occupancy, even if only partial observations are available; spatiotemporal continuity -- the movement of each object is continuous over space and time.

Multi-Frame to Single-Frame: Knowledge Distillation for 3D Object Detection

no code implementations24 Sep 2020 Yue Wang, Alireza Fathi, Jiajun Wu, Thomas Funkhouser, Justin Solomon

A common dilemma in 3D object detection for autonomous driving is that high-quality, dense point clouds are only available during training, but not testing.

3D Object Detection Autonomous Driving +1

End-to-End Optimization of Scene Layout

1 code implementation CVPR 2020 Andrew Luo, Zhoutong Zhang, Jiajun Wu, Joshua B. Tenenbaum

Experiments suggest that our model achieves higher accuracy and diversity in conditional scene synthesis and allows exemplar-based scene generation from various input forms.

Indoor Scene Reconstruction Indoor Scene Synthesis +2

Learning Physical Graph Representations from Visual Scenes

1 code implementation NeurIPS 2020 Daniel M. Bear, Chaofei Fan, Damian Mrowca, Yunzhu Li, Seth Alter, Aran Nayebi, Jeremy Schwartz, Li Fei-Fei, Jiajun Wu, Joshua B. Tenenbaum, Daniel L. K. Yamins

To overcome these limitations, we introduce the idea of Physical Scene Graphs (PSGs), which represent scenes as hierarchical graphs, with nodes in the hierarchy corresponding intuitively to object parts at different scales, and edges to physical connections between parts.

Scene Segmentation

When is Particle Filtering Efficient for Planning in Partially Observed Linear Dynamical Systems?

no code implementations10 Jun 2020 Simon S. Du, Wei Hu, Zhiyuan Li, Ruoqi Shen, Zhao Song, Jiajun Wu

Though errors in past actions may affect the future, we are able to bound the number of particles needed so that the long-run reward of the policy based on particle filtering is close to that based on exact inference.

Decision Making

Improving Learning Efficiency for Wireless Resource Allocation with Symmetric Prior

no code implementations18 May 2020 Chengjian Sun, Jiajun Wu, Chenyang Yang

The samples required to train a DNN after ranking can be reduced by $15 \sim 2, 400$ folds to achieve the same system performance as the counterpart without using prior.

Deep Audio Priors Emerge From Harmonic Convolutional Networks

no code implementations ICLR 2020 Zhoutong Zhang, Yunyun Wang, Chuang Gan, Jiajun Wu, Joshua B. Tenenbaum, Antonio Torralba, William T. Freeman

We show that networks using Harmonic Convolution can reliably model audio priors and achieve high performance in unsupervised audio restoration tasks.

Visual Grounding of Learned Physical Models

no code implementations ICML 2020 Yunzhu Li, Toru Lin, Kexin Yi, Daniel M. Bear, Daniel L. K. Yamins, Jiajun Wu, Joshua B. Tenenbaum, Antonio Torralba

The abilities to perform physical reasoning and to adapt to new environments, while intrinsic to humans, remain challenging to state-of-the-art computational models.

Visual Grounding

Visual Concept-Metaconcept Learning

1 code implementation NeurIPS 2019 Chi Han, Jiayuan Mao, Chuang Gan, Joshua B. Tenenbaum, Jiajun Wu

Humans reason with concepts and metaconcepts: we recognize red and green from visual input; we also understand that they describe the same property of objects (i. e., the color).

Look, Listen, and Act: Towards Audio-Visual Embodied Navigation

1 code implementation25 Dec 2019 Chuang Gan, Yiwei Zhang, Jiajun Wu, Boqing Gong, Joshua B. Tenenbaum

In this paper, we attempt to approach the problem of Audio-Visual Embodied Navigation, the task of planning the shortest path from a random starting location in a scene to the sound source in an indoor environment, given only raw egocentric visual and audio sensory data.

Modeling Expectation Violation in Intuitive Physics with Coarse Probabilistic Object Representations

1 code implementation NeurIPS 2019 Kevin Smith, Lingjie Mei, Shunyu Yao, Jiajun Wu, Elizabeth Spelke, Josh Tenenbaum, Tomer Ullman

We also present a new test set for measuring violations of physical expectations, using a range of scenarios derived from developmental psychology.

Scene Understanding

Proactive Optimization with Machine Learning: Femto-caching with Future Content Popularity

no code implementations29 Oct 2019 Jiajun Wu, Chengjian Sun, Chenyang Yang

In this paper, we introduce a proactive optimization framework for anticipatory resource allocation, where the future information is implicitly predicted under the same objective with the policy optimization in a single step.

Stochastic Optimization

Entity Abstraction in Visual Model-Based Reinforcement Learning

1 code implementation28 Oct 2019 Rishi Veerapaneni, John D. Co-Reyes, Michael Chang, Michael Janner, Chelsea Finn, Jiajun Wu, Joshua B. Tenenbaum, Sergey Levine

This paper tests the hypothesis that modeling a scene in terms of entities and their local interactions, as opposed to modeling the scene globally, provides a significant benefit in generalizing to physical tasks in a combinatorial space the learner has not encountered before.

Model-based Reinforcement Learning Object Discovery +2

Learning Compositional Koopman Operators for Model-Based Control

no code implementations ICLR 2020 Yunzhu Li, Hao He, Jiajun Wu, Dina Katabi, Antonio Torralba

Finding an embedding space for a linear approximation of a nonlinear dynamical system enables efficient system identification and control synthesis.

CLEVRER: CoLlision Events for Video REpresentation and Reasoning

3 code implementations ICLR 2020 Kexin Yi, Chuang Gan, Yunzhu Li, Pushmeet Kohli, Jiajun Wu, Antonio Torralba, Joshua B. Tenenbaum

While these models thrive on the perception-based task (descriptive), they perform poorly on the causal tasks (explanatory, predictive and counterfactual), suggesting that a principled approach for causal reasoning should incorporate the capability of both perceiving complex visual and language inputs, and understanding the underlying dynamics and causal relations.

Visual Reasoning

DualSMC: Tunneling Differentiable Filtering and Planning under Continuous POMDPs

1 code implementation28 Sep 2019 Yunbo Wang, Bo Liu, Jiajun Wu, Yuke Zhu, Simon S. Du, Li Fei-Fei, Joshua B. Tenenbaum

A major difficulty of solving continuous POMDPs is to infer the multi-modal distribution of the unobserved true states and to make the planning algorithm dependent on the perceived uncertainty.

Continuous Control

Program-Guided Image Manipulators

no code implementations ICCV 2019 Jiayuan Mao, Xiuming Zhang, Yikai Li, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu

Humans are capable of building holistic representations for images at various levels, from local objects, to pairwise relations, to global structures.

Image Inpainting

Neurally-Guided Structure Inference

no code implementations17 Jun 2019 Sidi Lu, Jiayuan Mao, Joshua B. Tenenbaum, Jiajun Wu

In this paper, we propose a hybrid inference algorithm, the Neurally-Guided Structure Inference (NG-SI), keeping the advantages of both search-based and data-driven methods.

Predicting the Present and Future States of Multi-agent Systems from Partially-observed Visual Data

no code implementations ICLR 2019 Chen Sun, Per Karlsson, Jiajun Wu, Joshua B. Tenenbaum, Kevin Murphy

We present a method which learns to integrate temporal information, from a learned dynamics model, with ambiguous visual information, from a learned vision model, in the context of interacting agents.

Modeling Parts, Structure, and System Dynamics via Predictive Learning

no code implementations ICLR 2019 Zhenjia Xu, Zhijian Liu, Chen Sun, Kevin Murphy, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu

Humans easily recognize object parts and their hierarchical structure by watching how they move; they can then predict how each part moves in the future.

Learning to Describe Scenes with Programs

no code implementations ICLR 2019 Yunchao Liu, Zheng Wu, Daniel Ritchie, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu

We are able to understand the higher-level, abstract regularities within the scene such as symmetry and repetition.

Combining Physical Simulators and Object-Based Networks for Control

no code implementations13 Apr 2019 Anurag Ajay, Maria Bauza, Jiajun Wu, Nima Fazeli, Joshua B. Tenenbaum, Alberto Rodriguez, Leslie P. Kaelbling

Physics engines play an important role in robot planning and control; however, many real-world control problems involve complex contact dynamics that cannot be characterized analytically.

Unsupervised Discovery of Parts, Structure, and Dynamics

no code implementations12 Mar 2019 Zhenjia Xu, Zhijian Liu, Chen Sun, Kevin Murphy, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu

Humans easily recognize object parts and their hierarchical structure by watching how they move; they can then predict how each part moves in the future.

Stochastic Prediction of Multi-Agent Interactions from Partial Observations

no code implementations25 Feb 2019 Chen Sun, Per Karlsson, Jiajun Wu, Joshua B. Tenenbaum, Kevin Murphy

We present a method that learns to integrate temporal information, from a learned dynamics model, with ambiguous visual information, from a learned vision model, in the context of interacting agents.

Learning to Infer and Execute 3D Shape Programs

no code implementations ICLR 2019 Yonglong Tian, Andrew Luo, Xingyuan Sun, Kevin Ellis, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu

Human perception of 3D shapes goes beyond reconstructing them as a set of points or a composition of geometric primitives: we also effortlessly understand higher-level shape structure such as the repetition and reflective symmetry of object parts.

Learning to Reconstruct Shapes from Unseen Classes

no code implementations NeurIPS 2018 Xiuming Zhang, Zhoutong Zhang, Chengkai Zhang, Joshua B. Tenenbaum, William T. Freeman, Jiajun Wu

From a single image, humans are able to perceive the full 3D shape of an object by exploiting learned shape priors from everyday life.

3D Reconstruction

Visual Object Networks: Image Generation with Disentangled 3D Representations

1 code implementation NeurIPS 2018 Jun-Yan Zhu, Zhoutong Zhang, Chengkai Zhang, Jiajun Wu, Antonio Torralba, Josh Tenenbaum, Bill Freeman

The VON not only generates images that are more realistic than the state-of-the-art 2D image synthesis methods but also enables many 3D operations such as changing the viewpoint of a generated image, shape and texture editing, linear interpolation in texture and shape space, and transferring appearance across different objects and viewpoints.

Image Generation

Learning to Exploit Stability for 3D Scene Parsing

no code implementations NeurIPS 2018 Yilun Du, Zhijian Liu, Hector Basevi, Ales Leonardis, Bill Freeman, Josh Tenenbaum, Jiajun Wu

We first show that applying physics supervision to an existing scene understanding model increases performance, produces more stable predictions, and allows training to an equivalent performance level with fewer annotated training examples.

Scene Understanding Translation

Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding

1 code implementation NeurIPS 2018 Kexin Yi, Jiajun Wu, Chuang Gan, Antonio Torralba, Pushmeet Kohli, Joshua B. Tenenbaum

Second, the model is more data- and memory-efficient: it performs well after learning on a small number of training data; it can also encode an image into a compact representation, requiring less storage than existing methods for offline question answering.

Question Answering Representation Learning +1

ChainQueen: A Real-Time Differentiable Physical Simulator for Soft Robotics

no code implementations2 Oct 2018 Yuanming Hu, Jian-Cheng Liu, Andrew Spielberg, Joshua B. Tenenbaum, William T. Freeman, Jiajun Wu, Daniela Rus, Wojciech Matusik

The underlying physical laws of deformable objects are more complex, and the resulting systems have orders of magnitude more degrees of freedom and therefore they are significantly more computationally expensive to simulate.

Motion Planning

Propagation Networks for Model-Based Control Under Partial Observation

1 code implementation28 Sep 2018 Yunzhu Li, Jiajun Wu, Jun-Yan Zhu, Joshua B. Tenenbaum, Antonio Torralba, Russ Tedrake

There has been an increasing interest in learning dynamics simulators for model-based control.

MoSculp: Interactive Visualization of Shape and Time

no code implementations14 Sep 2018 Xiuming Zhang, Tali Dekel, Tianfan Xue, Andrew Owens, Qiurui He, Jiajun Wu, Stefanie Mueller, William T. Freeman

We present a system that allows users to visualize complex human motion via 3D motion sculptures---a representation that conveys the 3D structure swept by a human body as it moves through space.

Physical Primitive Decomposition

no code implementations ECCV 2018 Zhijian Liu, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu

As annotated data for object parts and physics are rare, we propose a novel formulation that learns physical primitives by explaining both an object's appearance and its behaviors in physical events.

Learning Shape Priors for Single-View 3D Completion and Reconstruction

no code implementations ECCV 2018 Jiajun Wu, Chengkai Zhang, Xiuming Zhang, Zhoutong Zhang, William T. Freeman, Joshua B. Tenenbaum

The problem of single-view 3D shape completion or reconstruction is challenging, because among the many possible shapes that explain an observation, most are implausible and do not correspond to natural objects.

Seeing Tree Structure from Vibration

no code implementations ECCV 2018 Tianfan Xue, Jiajun Wu, Zhoutong Zhang, Chengkai Zhang, Joshua B. Tenenbaum, William T. Freeman

Humans recognize object structure from both their appearance and motion; often, motion helps to resolve ambiguities in object structure that arise when we observe object appearance only.

Bayesian Inference

3D-Aware Scene Manipulation via Inverse Graphics

1 code implementation NeurIPS 2018 Shunyu Yao, Tzu Ming Harry Hsu, Jun-Yan Zhu, Jiajun Wu, Antonio Torralba, William T. Freeman, Joshua B. Tenenbaum

In this work, we propose 3D scene de-rendering networks (3D-SDN) to address the above issues by integrating disentangled representations for semantics, geometry, and appearance into a deep generative model.

Augmenting Physical Simulators with Stochastic Neural Networks: Case Study of Planar Pushing and Bouncing

no code implementations9 Aug 2018 Anurag Ajay, Jiajun Wu, Nima Fazeli, Maria Bauza, Leslie P. Kaelbling, Joshua B. Tenenbaum, Alberto Rodriguez

An efficient, generalizable physical simulator with universal uncertainty estimates has wide applications in robot state estimation, planning, and control.

Gaussian Processes

Unsupervised Learning of Latent Physical Properties Using Perception-Prediction Networks

no code implementations24 Jul 2018 David Zheng, Vinson Luo, Jiajun Wu, Joshua B. Tenenbaum

We propose a framework for the completely unsupervised learning of latent object properties from their interactions: the perception-prediction network (PPN).

Visual Dynamics: Stochastic Future Generation via Layered Cross Convolutional Networks

no code implementations24 Jul 2018 Tianfan Xue, Jiajun Wu, Katherine L. Bouman, William T. Freeman

We study the problem of synthesizing a number of likely future frames from a single input image.

3D Interpreter Networks for Viewer-Centered Wireframe Modeling

no code implementations3 Apr 2018 Jiajun Wu, Tianfan Xue, Joseph J. Lim, Yuandong Tian, Joshua B. Tenenbaum, Antonio Torralba, William T. Freeman

3D-INN is trained on real images to estimate 2D keypoint heatmaps from an input image; it then predicts 3D object structure from heatmaps using knowledge learned from synthetic 3D shapes.

Image Retrieval

Learning Sight from Sound: Ambient Sound Provides Supervision for Visual Learning

no code implementations20 Dec 2017 Andrew Owens, Jiajun Wu, Josh H. McDermott, William T. Freeman, Antonio Torralba

The sound of crashing waves, the roar of fast-moving cars -- sound conveys important information about the objects in our surroundings.

Learning to See Physics via Visual De-animation

no code implementations NeurIPS 2017 Jiajun Wu, Erika Lu, Pushmeet Kohli, Bill Freeman, Josh Tenenbaum

At the core of our system is a physical world representation that is first recovered by a perception module and then utilized by physics and graphics engines.

Future prediction

Shape and Material from Sound

no code implementations NeurIPS 2017 Zhoutong Zhang, Qiujia Li, Zhengjia Huang, Jiajun Wu, Josh Tenenbaum, Bill Freeman

Hearing an object falling onto the ground, humans can recover rich information including its rough shape, material, and falling height.

Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification

3 code implementations CVPR 2018 Xiang Long, Chuang Gan, Gerard de Melo, Jiajun Wu, Xiao Liu, Shilei Wen

In this paper, however, we show that temporal information, especially longer-term patterns, may not be necessary to achieve competitive results on common video classification datasets.

General Classification Video Classification

Self-Supervised Intrinsic Image Decomposition

no code implementations NeurIPS 2017 Michael Janner, Jiajun Wu, Tejas D. Kulkarni, Ilker Yildirim, Joshua B. Tenenbaum

Intrinsic decomposition from a single image is a highly challenging task, due to its inherent ambiguity and the scarcity of training data.

Intrinsic Image Decomposition Transfer Learning

MarrNet: 3D Shape Reconstruction via 2.5D Sketches

no code implementations NeurIPS 2017 Jiajun Wu, Yifan Wang, Tianfan Xue, Xingyuan Sun, William T. Freeman, Joshua B. Tenenbaum

First, compared to full 3D shape, 2. 5D sketches are much easier to be recovered from a 2D image; models that recover 2. 5D sketches are also more likely to transfer from synthetic to real data.

3D Object Reconstruction From A Single Image 3D Reconstruction +2

Raster-To-Vector: Revisiting Floorplan Transformation

1 code implementation ICCV 2017 Chen Liu, Jiajun Wu, Pushmeet Kohli, Yasutaka Furukawa

A neural architecture first transforms a rasterized image to a set of junctions that represent low-level geometric and semantic information (e. g., wall corners or door end-points).

Vector Graphics

Synthesizing 3D Shapes via Modeling Multi-View Depth Maps and Silhouettes With Deep Generative Networks

no code implementations CVPR 2017 Amir Arsalan Soltani, Haibin Huang, Jiajun Wu, Tejas D. Kulkarni, Joshua B. Tenenbaum

We take an alternative approach: learning a generative model over multi-view depth maps or their corresponding silhouettes, and using a deterministic rendering function to produce 3D shapes from these images.

Neural Scene De-Rendering

no code implementations CVPR 2017 Jiajun Wu, Joshua B. Tenenbaum, Pushmeet Kohli

Our approach employs a deterministic rendering function as the decoder, mapping a naturally structured and disentangled scene description, which we named scene XML, to an image.

Image Captioning Scene Understanding

Deep Multi-Modal Image Correspondence Learning

no code implementations5 Dec 2016 Chen Liu, Jiajun Wu, Pushmeet Kohli, Yasutaka Furukawa

Our result implies that neural networks are effective at perceptual tasks that require long periods of reasoning even for humans to solve.

Ambient Sound Provides Supervision for Visual Learning

1 code implementation25 Aug 2016 Andrew Owens, Jiajun Wu, Josh H. McDermott, William T. Freeman, Antonio Torralba

We show that, through this process, the network learns a representation that conveys information about objects and scenes.

Object Recognition

Single Image 3D Interpreter Network

no code implementations29 Apr 2016 Jiajun Wu, Tianfan Xue, Joseph J. Lim, Yuandong Tian, Joshua B. Tenenbaum, Antonio Torralba, William T. Freeman

In this work, we propose 3D INterpreter Network (3D-INN), an end-to-end framework which sequentially estimates 2D keypoint heatmaps and 3D object structure, trained on both real 2D-annotated images and synthetic 3D data.

Image Retrieval

Galileo: Perceiving Physical Object Properties by Integrating a Physics Engine with Deep Learning

no code implementations NeurIPS 2015 Jiajun Wu, Ilker Yildirim, Joseph J. Lim, Bill Freeman, Josh Tenenbaum

Humans demonstrate remarkable abilities to predict physical events in dynamic scenes, and to infer the physical properties of objects from static images.

Scene Understanding

Deep Multiple Instance Learning for Image Classification and Auto-Annotation

no code implementations CVPR 2015 Jiajun Wu, Yinan Yu, Chang Huang, Kai Yu

The recent development in learning deep representations has demonstrated its wide applications in traditional vision tasks like classification and detection.

General Classification Image Classification +1

MILCut: A Sweeping Line Multiple Instance Learning Paradigm for Interactive Image Segmentation

no code implementations CVPR 2014 Jiajun Wu, Yibiao Zhao, Jun-Yan Zhu, Siwei Luo, Zhuowen Tu

Interactive segmentation, in which a user provides a bounding box to an object of interest for image segmentation, has been applied to a variety of applications in image editing, crowdsourcing, computer vision, and medical imaging.

Interactive Segmentation Multiple Instance Learning +1

Harvesting Mid-level Visual Concepts from Large-Scale Internet Images

no code implementations CVPR 2013 Quannan Li, Jiajun Wu, Zhuowen Tu

Obtaining effective mid-level representations has become an increasingly important task in computer vision.

Image Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.