Search Results for author: Jiajun Wu

Found 192 papers, 56 papers with code

PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation

no code implementations • 19 Apr 2024 • Tianyuan Zhang, Hong-Xing Yu, Rundi Wu, Brandon Y. Feng, Changxi Zheng, Noah Snavely, Jiajun Wu, William T. Freeman

Unlike unconditional or text-conditioned dynamics generation, action-conditioned dynamics requires perceiving the physical material properties of objects and grounding the 3D motion prediction on these properties, such as object stiffness.

motion prediction Object +1

Paper
Add Code

Tripod: Three Complementary Inductive Biases for Disentangled Representation Learning

no code implementations • 16 Apr 2024 • Kyle Hsu, Jubayer Ibn Hamid, Kaylee Burns, Chelsea Finn, Jiajun Wu

Inductive biases are crucial in disentangled representation learning for narrowing down an underspecified solution set.

Data Compression Disentanglement +1

Paper
Add Code

Text-Based Reasoning About Vector Graphics

no code implementations • 9 Apr 2024 • Zhenhailong Wang, Joy Hsu, Xingyao Wang, Kuan-Hao Huang, Manling Li, Jiajun Wu, Heng Ji

By casting an image to a text-based representation, we can leverage the power of language models to learn alignment from SVG to visual primitives and generalize to unseen question-answering tasks.

Descriptive Language Modelling +2

Paper
Add Code

3D Congealing: 3D-Aware Image Alignment in the Wild

no code implementations • 2 Apr 2024 • Yunzhi Zhang, Zizhang Li, Amit Raj, Andreas Engelhardt, Yuanzhen Li, Tingbo Hou, Jiajun Wu, Varun Jampani

The framework optimizes for the canonical representation together with the pose for each input image, and a per-image coordinate map that warps 2D pixel coordinates to the 3D canonical frame to account for the shape matching.

Pose Estimation

Paper
Add Code

BEHAVIOR-1K: A Human-Centered, Embodied AI Benchmark with 1,000 Everyday Activities and Realistic Simulation

no code implementations • 14 Mar 2024 • Chengshu Li, Ruohan Zhang, Josiah Wong, Cem Gokmen, Sanjana Srivastava, Roberto Martín-Martín, Chen Wang, Gabrael Levine, Wensi Ai, Benjamin Martinez, Hang Yin, Michael Lingelbach, Minjune Hwang, Ayano Hiranaka, Sujay Garlanka, Arman Aydin, Sharon Lee, Jiankai Sun, Mona Anvari, Manasi Sharma, Dhruva Bansal, Samuel Hunter, Kyu-Young Kim, Alan Lou, Caleb R Matthews, Ivan Villa-Renteria, Jerry Huayang Tang, Claire Tang, Fei Xia, Yunzhu Li, Silvio Savarese, Hyowon Gweon, C. Karen Liu, Jiajun Wu, Li Fei-Fei

We present BEHAVIOR-1K, a comprehensive simulation benchmark for human-centered robotics.

Paper
Add Code

Reconstruction and Simulation of Elastic Objects with Spring-Mass 3D Gaussians

no code implementations • 14 Mar 2024 • Licheng Zhong, Hong-Xing Yu, Jiajun Wu, Yunzhu Li

In particular, we develop and integrate a 3D Spring-Mass model into 3D Gaussian kernels, enabling the reconstruction of the visual appearance, shape, and physical dynamics of the object.

Future prediction Object

Paper
Add Code

Unsupervised Discovery of Object-Centric Neural Fields

no code implementations • 12 Feb 2024 • Rundong Luo, Hong-Xing Yu, Jiajun Wu

Extensive experiments show that uOCF enables unsupervised discovery of visually rich objects from a single real image, allowing applications such as 3D object segmentation and scene manipulation.

Object Object Discovery +3

Paper
Add Code

Learning the 3D Fauna of the Web

no code implementations • 4 Jan 2024 • Zizhang Li, Dor Litvak, Ruining Li, Yunzhi Zhang, Tomas Jakab, Christian Rupprecht, Shangzhe Wu, Andrea Vedaldi, Jiajun Wu

We show that prior category-specific attempts fail to generalize to rare species with limited training images.

Paper
Add Code

CityPulse: Fine-Grained Assessment of Urban Change with Street View Time Series

no code implementations • 2 Jan 2024 • Tianyuan Huang, Zejia Wu, Jiajun Wu, Jackelyn Hwang, Ram Rajagopal

Urban transformations have profound societal impact on both individuals and communities at large.

Change Detection Time Series

Paper
Add Code

Fluid Simulation on Neural Flow Maps

no code implementations • 22 Dec 2023 • Yitong Deng, Hong-Xing Yu, Diyang Zhang, Jiajun Wu, Bo Zhu

We introduce Neural Flow Maps, a novel simulation method bridging the emerging paradigm of implicit neural representations with fluid simulation based on the theory of flow maps, to achieve state-of-the-art simulation of inviscid fluid phenomena.

Paper
Add Code

Ponymation: Learning 3D Animal Motions from Unlabeled Online Videos

no code implementations • 21 Dec 2023 • Keqiang Sun, Dor Litvak, Yunzhi Zhang, Hongsheng Li, Jiajun Wu, Shangzhe Wu

We introduce Ponymation, a new method for learning a generative model of articulated 3D animal motions from raw, unlabeled online videos.

Motion Synthesis

Paper
Add Code

Model-Based Control with Sparse Neural Dynamics

no code implementations • NeurIPS 2023 • Ziang Liu, Genggeng Zhou, Jeff He, Tobia Marcucci, Li Fei-Fei, Jiajun Wu, Yunzhu Li

In this paper, we propose a new framework for integrated model learning and predictive control that is amenable to efficient optimization algorithms.

Paper
Add Code

SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing

1 code implementation • 20 Dec 2023 • Zhecheng Wang, Rajanie Prabha, Tianyuan Huang, Jiajun Wu, Ram Rajagopal

Remote sensing imagery, despite its broad applications in helping achieve Sustainable Development Goals and tackle climate change, has not yet benefited from the recent advancements of versatile, task-agnostic vision language models (VLMs).

Attribute Cross-Modal Retrieval +3

Paper
Code

Holodeck: Language Guided Generation of 3D Embodied AI Environments

1 code implementation • 14 Dec 2023 • Yue Yang, Fan-Yun Sun, Luca Weihs, Eli VanderBilt, Alvaro Herrasti, Winson Han, Jiajun Wu, Nick Haber, Ranjay Krishna, Lingjie Liu, Chris Callison-Burch, Mark Yatskar, Aniruddha Kembhavi, Christopher Clark

3D simulated environments play a critical role in Embodied AI, but their creation requires expertise and extensive manual effort, restricting their diversity and scope.

Common Sense Reasoning Language Modelling +2

263

Paper
Code

Inferring Hybrid Neural Fluid Fields from Videos

no code implementations • NeurIPS 2023 • Hong-Xing Yu, Yang Zheng, Yuan Gao, Yitong Deng, Bo Zhu, Jiajun Wu

Specifically, to deal with visual ambiguities of fluid velocity, we introduce a set of physics-based losses that enforce inferring a physically plausible velocity field, which is divergence-free and drives the transport of density.

Dynamic Reconstruction Future prediction

Paper
Add Code

3D Copy-Paste: Physically Plausible Object Insertion for Monocular 3D Detection

1 code implementation • NeurIPS 2023 • Yunhao Ge, Hong-Xing Yu, Cheng Zhao, Yuliang Guo, Xinyu Huang, Liu Ren, Laurent Itti, Jiajun Wu

A major challenge in monocular 3D object detection is the limited diversity and quantity of objects in real datasets.

Data Augmentation Monocular 3D Object Detection +2

Paper
Code

Controllable Human-Object Interaction Synthesis

no code implementations • 6 Dec 2023 • Jiaman Li, Alexander Clegg, Roozbeh Mottaghi, Jiajun Wu, Xavier Puig, C. Karen Liu

Naively applying a diffusion model fails to predict object motion aligned with the input waypoints and cannot ensure the realism of interactions that require precise hand-object contact and appropriate contact grounded by the floor.

Human-Object Interaction Detection Object

Paper
Add Code

WonderJourney: Going from Anywhere to Everywhere

no code implementations • 6 Dec 2023 • Hong-Xing Yu, Haoyi Duan, Junhwa Hur, Kyle Sargent, Michael Rubinstein, William T. Freeman, Forrester Cole, Deqing Sun, Noah Snavely, Jiajun Wu, Charles Herrmann

We introduce WonderJourney, a modularized framework for perpetual 3D scene generation.

Point Cloud Generation Scene Generation

Paper
Add Code

Language-Informed Visual Concept Learning

no code implementations • 6 Dec 2023 • Sharon Lee, Yunzhi Zhang, Shangzhe Wu, Jiajun Wu

To encourage better disentanglement of different concept encoders, we anchor the concept embeddings to a set of text embeddings obtained from a pre-trained Visual Question Answering (VQA) model.

Disentanglement Novel Concepts +2

Paper
Add Code

Holistic Evaluation of Text-To-Image Models

1 code implementation • NeurIPS 2023 • Tony Lee, Michihiro Yasunaga, Chenlin Meng, Yifan Mai, Joon Sung Park, Agrim Gupta, Yunzhi Zhang, Deepak Narayanan, Hannah Benita Teufel, Marco Bellagente, Minguk Kang, Taesung Park, Jure Leskovec, Jun-Yan Zhu, Li Fei-Fei, Jiajun Wu, Stefano Ermon, Percy Liang

The stunning qualitative improvement of recent text-to-image models has led to their widespread attention and adoption.

Fairness

1,635

Paper
Code

SoundCam: A Dataset for Finding Humans Using Room Acoustics

no code implementations • NeurIPS 2023 • Mason Wang, Samuel Clarke, Jui-Hsien Wang, Ruohan Gao, Jiajun Wu

A room's acoustic properties are a product of the room's geometry, the objects within the room, and their specific positions.

Paper
Add Code

NOIR: Neural Signal Operated Intelligent Robots for Everyday Activities

no code implementations • 2 Nov 2023 • Ruohan Zhang, Sharon Lee, Minjune Hwang, Ayano Hiranaka, Chen Wang, Wensi Ai, Jin Jie Ryan Tan, Shreya Gupta, Yilun Hao, Gabrael Levine, Ruohan Gao, Anthony Norcia, Li Fei-Fei, Jiajun Wu

We present Neural Signal Operated Intelligent Robots (NOIR), a general-purpose, intelligent brain-robot interface system that enables humans to command robots to perform everyday activities through brain signals.

EEG

Paper
Add Code

Learning to Design and Use Tools for Robotic Manipulation

no code implementations • 1 Nov 2023 • Ziang Liu, Stephen Tian, Michelle Guo, C. Karen Liu, Jiajun Wu

A designer policy is conditioned on task information and outputs a tool design that helps solve the task.

Paper
Add Code

ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Image

no code implementations • 27 Oct 2023 • Kyle Sargent, Zizhang Li, Tanmay Shah, Charles Herrmann, Hong-Xing Yu, Yunzhi Zhang, Eric Ryan Chan, Dmitry Lagun, Li Fei-Fei, Deqing Sun, Jiajun Wu

Further, we observe that Score Distillation Sampling (SDS) tends to truncate the distribution of complex backgrounds during distillation of 360-degree scenes, and propose "SDS anchoring" to improve the diversity of synthesized novel views.

Novel View Synthesis

Paper
Add Code

Stanford-ORB: A Real-World 3D Object Inverse Rendering Benchmark

1 code implementation • NeurIPS 2023 • Zhengfei Kuang, Yunzhi Zhang, Hong-Xing Yu, Samir Agarwala, Shangzhe Wu, Jiajun Wu

We introduce Stanford-ORB, a new real-world 3D Object inverse Rendering Benchmark.

Depth Prediction Image Relighting +4

Paper
Code

What's Left? Concept Grounding with Logic-Enhanced Foundation Models

1 code implementation • 24 Oct 2023 • Joy Hsu, Jiayuan Mao, Joshua B. Tenenbaum, Jiajun Wu

We propose the Logic-Enhanced Foundation Model (LEFT), a unified framework that learns to ground and reason with concepts across domains with a differentiable, domain-independent, first-order logic-based program executor.

Visual Reasoning

Paper
Code

CLEVRER-Humans: Describing Physical and Causal Events the Human Way

no code implementations • 5 Oct 2023 • Jiayuan Mao, Xuelin Yang, Xikun Zhang, Noah D. Goodman, Jiajun Wu

First, there is a lack of diversity in both event types and natural language descriptions; second, causal relationships based on manually-defined heuristics are different from human judgments.

Causal Judgment Data Augmentation +1

Paper
Add Code

Mini-BEHAVIOR: A Procedurally Generated Benchmark for Long-horizon Decision-Making in Embodied AI

1 code implementation • 3 Oct 2023 • Emily Jin, Jiaheng Hu, Zhuoyi Huang, Ruohan Zhang, Jiajun Wu, Li Fei-Fei, Roberto Martín-Martín

We present Mini-BEHAVIOR, a novel benchmark for embodied AI that challenges agents to use reasoning and decision-making skills to solve complex activities that resemble everyday human challenges.

Decision Making

Paper
Code

D$^3$Fields: Dynamic 3D Descriptor Fields for Zero-Shot Generalizable Robotic Manipulation

no code implementations • 28 Sep 2023 • YiXuan Wang, Zhuoran Li, Mingtong Zhang, Katherine Driggs-Campbell, Jiajun Wu, Li Fei-Fei, Yunzhu Li

These fields capture the dynamics of the underlying 3D environment and encode both semantic features and instance masks.

Paper
Add Code

Object Motion Guided Human Motion Synthesis

no code implementations • 28 Sep 2023 • Jiaman Li, Jiajun Wu, C. Karen Liu

We propose Object MOtion guided human MOtion synthesis (OMOMO), a conditional diffusion framework that can generate full-body manipulation behaviors from only the object motion.

Denoising Human-Object Interaction Detection +2

Paper
Add Code

Tree-Structured Shading Decomposition

no code implementations • ICCV 2023 • Chen Geng, Hong-Xing Yu, Sharon Zhang, Maneesh Agrawala, Jiajun Wu

The shade tree representation enables novice users who are unfamiliar with the physical shading process to edit object shading in an efficient and intuitive manner.

Object

Paper
Add Code

Learning Sequential Acquisition Policies for Robot-Assisted Feeding

no code implementations • 11 Sep 2023 • Priya Sundaresan, Jiajun Wu, Dorsa Sadigh

A robot providing mealtime assistance must perform specialized maneuvers with various utensils in order to pick up and feed a range of food items.

Paper
Add Code

Physically Grounded Vision-Language Models for Robotic Manipulation

no code implementations • 5 Sep 2023 • Jensen Gao, Bidipta Sarkar, Fei Xia, Ted Xiao, Jiajun Wu, Brian Ichter, Anirudha Majumdar, Dorsa Sadigh

We incorporate this physically grounded VLM in an interactive framework with a large language model-based robotic planner, and show improved planning performance on tasks that require reasoning about physical object concepts, compared to baselines that do not leverage physically grounded VLMs.

Image Captioning Language Modelling +4

Paper
Add Code

Compositional Diffusion-Based Continuous Constraint Solvers

no code implementations • 2 Sep 2023 • Zhutian Yang, Jiayuan Mao, Yilun Du, Jiajun Wu, Joshua B. Tenenbaum, Tomás Lozano-Pérez, Leslie Pack Kaelbling

This paper introduces an approach for learning to solve continuous constraint satisfaction problems (CCSP) in robotic reasoning and planning.

Paper
Add Code

Rendering Humans from Object-Occluded Monocular Videos

no code implementations • ICCV 2023 • Tiange Xiang, Adam Sun, Jiajun Wu, Ehsan Adeli, Li Fei-Fei

3D understanding and rendering of moving humans from monocular videos is a challenging task.

Neural Rendering Object

Paper
Add Code

Patched Denoising Diffusion Models For High-Resolution Image Synthesis

1 code implementation • 2 Aug 2023 • Zheng Ding, Mengqi Zhang, Jiajun Wu, Zhuowen Tu

Feature collage systematically crops and combines partial features of the neighboring patches to predict the features of a shifted image patch, allowing the seamless generation of the entire image due to the overlap in the patch feature space.

Denoising Image Generation

Paper
Code

Primitive Skill-based Robot Learning from Human Evaluative Feedback

no code implementations • 28 Jul 2023 • Ayano Hiranaka, Minjune Hwang, Sharon Lee, Chen Wang, Li Fei-Fei, Jiajun Wu, Ruohan Zhang

By combining them, SEED reduces the human effort required in RLHF and increases safety in training robot manipulation with RL in real-world settings.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models

1 code implementation • 12 Jul 2023 • Wenlong Huang, Chen Wang, Ruohan Zhang, Yunzhu Li, Jiajun Wu, Li Fei-Fei

The composed value maps are then used in a model-based planning framework to zero-shot synthesize closed-loop robot trajectories with robustness to dynamic perturbations.

Language Modelling Robot Manipulation

364

Paper
Code

Giving Robots a Hand: Learning Generalizable Manipulation with Eye-in-Hand Human Video Demonstrations

no code implementations • 12 Jul 2023 • Moo Jin Kim, Jiajun Wu, Chelsea Finn

Eye-in-hand cameras have shown promise in enabling greater sample efficiency and generalization in vision-based robotic manipulation.

Domain Adaptation

Paper
Add Code

Dynamic-Resolution Model Learning for Object Pile Manipulation

no code implementations • 29 Jun 2023 • YiXuan Wang, Yunzhu Li, Katherine Driggs-Campbell, Li Fei-Fei, Jiajun Wu

Prior works typically assume representation at a fixed dimension or resolution, which may be inefficient for simple tasks and ineffective for more complicated tasks.

Model Predictive Control Object

Paper
Add Code

Task-Driven Graph Attention for Hierarchical Relational Object Navigation

no code implementations • 23 Jun 2023 • Michael Lingelbach, Chengshu Li, Minjune Hwang, Andrey Kurenkov, Alan Lou, Roberto Martín-Martín, Ruohan Zhang, Li Fei-Fei, Jiajun Wu

Embodied AI agents in large scenes often need to navigate to find objects.

Graph Attention Navigate +1

Paper
Add Code

RealImpact: A Dataset of Impact Sound Fields for Real Objects

no code implementations • CVPR 2023 • Samuel Clarke, Ruohan Gao, Mason Wang, Mark Rau, Julia Xu, Jui-Hsien Wang, Doug L. James, Jiajun Wu

Objects make unique sounds under different perturbations, environment conditions, and poses relative to the listener.

audio-visual learning

Paper
Add Code

Multi-Object Manipulation via Object-Centric Neural Scattering Functions

no code implementations • CVPR 2023 • Stephen Tian, Yancheng Cai, Hong-Xing Yu, Sergey Zakharov, Katherine Liu, Adrien Gaidon, Yunzhu Li, Jiajun Wu

Learned visual dynamics models have proven effective for robotic manipulation tasks.

Model Predictive Control Object

Paper
Add Code

HomE: Homography-Equivariant Video Representation Learning

1 code implementation • 2 Jun 2023 • Anirudh Sriram, Adrien Gaidon, Jiajun Wu, Juan Carlos Niebles, Li Fei-Fei, Ehsan Adeli

In this work, we propose a novel method for representation learning of multi-view videos, where we explicitly model the representation space to maintain Homography Equivariance (HomE).

Action Classification Action Recognition +2

Paper
Code

Sonicverse: A Multisensory Simulation Platform for Embodied Household Agents that See and Hear

1 code implementation • 1 Jun 2023 • Ruohan Gao, Hao Li, Gokul Dharan, Zhuzhu Wang, Chengshu Li, Fei Xia, Silvio Savarese, Li Fei-Fei, Jiajun Wu

We introduce Sonicverse, a multisensory simulation platform with integrated audio-visual simulation for training household agents that can both see and hear.

Multi-Task Learning Visual Navigation

Paper
Code

The ObjectFolder Benchmark: Multisensory Learning with Neural and Real Objects

no code implementations • CVPR 2023 • Ruohan Gao, Yiming Dou, Hao Li, Tanmay Agarwal, Jeannette Bohg, Yunzhu Li, Li Fei-Fei, Jiajun Wu

We introduce the ObjectFolder Benchmark, a benchmark suite of 10 tasks for multisensory object-centric learning, centered around object recognition, reconstruction, and manipulation with sight, sound, and touch.

Benchmarking Object +1

Paper
Add Code

Disentanglement via Latent Quantization

1 code implementation • NeurIPS 2023 • Kyle Hsu, Will Dorrell, James C. R. Whittington, Jiajun Wu, Chelsea Finn

In this work, we construct an inductive bias towards encoding to and decoding from an organized latent space.

Disentanglement Inductive Bias +1

Paper
Code

Modeling Dynamic Environments with Scene Graph Memory

no code implementations • 27 May 2023 • Andrey Kurenkov, Michael Lingelbach, Tanmay Agarwal, Emily Jin, Chengshu Li, Ruohan Zhang, Li Fei-Fei, Jiajun Wu, Silvio Savarese, Roberto Martín-Martín

We evaluate our method in the Dynamic House Simulator, a new benchmark that creates diverse dynamic graphs following the semantic patterns typically seen at homes, and show that NEP can be trained to predict the locations of objects in a variety of environments with diverse object movement dynamics, outperforming baselines both in terms of new scene adaptability and overall accuracy.

Link Prediction

Paper
Add Code

Motion Question Answering via Modular Motion Programs

1 code implementation • 15 May 2023 • Mark Endo, Joy Hsu, Jiaman Li, Jiajun Wu

In order to build artificial intelligence systems that can perceive and reason with human behavior in the real world, we must first design models that conduct complex spatio-temporal reasoning over motion sequences.

Attribute Question Answering

Paper
Code

ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding

1 code implementation • 14 May 2023 • Le Xue, Ning Yu, Shu Zhang, Junnan Li, Roberto Martín-Martín, Jiajun Wu, Caiming Xiong, ran Xu, Juan Carlos Niebles, Silvio Savarese

Recent advancements in multimodal pre-training methods have shown promising efficacy in 3D representation learning by aligning multimodal features across 3D shapes, their 2D counterparts, and language descriptions.

Ranked #6 on 3D Point Cloud Classification on ScanObjectNN (using extra training data)

3D Point Cloud Classification Representation Learning +1

354

Paper
Code

An Extensible Multimodal Multi-task Object Dataset with Materials

no code implementations • 29 Apr 2023 • Trevor Standley, Ruohan Gao, Dawn Chen, Jiajun Wu, Silvio Savarese

For example, we can train a model to predict the object category from the listing text, or the mass and price from the product listing image.

Attribute Multi-Task Learning +1

Paper
Add Code

Putting People in Their Place: Affordance-Aware Human Insertion into Scenes

1 code implementation • CVPR 2023 • Sumith Kulal, Tim Brooks, Alex Aiken, Jiajun Wu, Jimei Yang, Jingwan Lu, Alexei A. Efros, Krishna Kumar Singh

Given a scene image with a marked region and an image of a person, we insert the person into the scene while respecting the scene affordances.

129

Paper
Code

Programmatically Grounded, Compositionally Generalizable Robotic Manipulation

no code implementations • 26 Apr 2023 • Renhao Wang, Jiayuan Mao, Joy Hsu, Hang Zhao, Jiajun Wu, Yang Gao

Robots operating in the real world require both rich manipulation skills as well as the ability to semantically reason about when to apply those skills.

Imitation Learning

Paper
Add Code

A Control-Centric Benchmark for Video Prediction

1 code implementation • 26 Apr 2023 • Stephen Tian, Chelsea Finn, Jiajun Wu

Video is a promising source of knowledge for embodied agents to learn models of the world's dynamics.

Video Prediction

Paper
Code

Partial-View Object View Synthesis via Filtered Inversion

no code implementations • 3 Apr 2023 • Fan-Yun Sun, Jonathan Tremblay, Valts Blukis, Kevin Lin, Danfei Xu, Boris Ivanovic, Peter Karkus, Stan Birchfield, Dieter Fox, Ruohan Zhang, Yunzhu Li, Jiajun Wu, Marco Pavone, Nick Haber

At inference, given one or more views of a novel real-world object, FINV first finds a set of latent codes for the object by inverting the generative model from multiple initial seeds.

Object

Paper
Add Code

CIRCLE: Capture In Rich Contextual Environments

1 code implementation • CVPR 2023 • Joao Pedro Araujo, Jiaman Li, Karthik Vetrivel, Rishi Agarwal, Deepak Gopinath, Jiajun Wu, Alexander Clegg, C. Karen Liu

Leveraging our dataset, the model learns to use ego-centric scene information to achieve nontrivial reaching tasks in the context of complex 3D scenes.

Paper
Code

NS3D: Neuro-Symbolic Grounding of 3D Objects and Relations

no code implementations • CVPR 2023 • Joy Hsu, Jiayuan Mao, Jiajun Wu

Different functional modules in the programs are implemented as neural networks.

Question Answering Referring Expression +2

Paper
Add Code

Learning Object-Centric Neural Scattering Functions for Free-Viewpoint Relighting and Scene Composition

no code implementations • 10 Mar 2023 • Hong-Xing Yu, Michelle Guo, Alireza Fathi, Yen-Yu Chang, Eric Ryan Chan, Ruohan Gao, Thomas Funkhouser, Jiajun Wu

We propose Object-Centric Neural Scattering Functions (OSFs) for learning to reconstruct object appearance from only images.

Inverse Rendering Object

Paper
Add Code

Learning Rational Subgoals from Demonstrations and Instructions

no code implementations • 9 Mar 2023 • Zhezheng Luo, Jiayuan Mao, Jiajun Wu, Tomás Lozano-Pérez, Joshua B. Tenenbaum, Leslie Pack Kaelbling

We present a framework for learning useful subgoals that support efficient long-term planning to achieve novel goals.

Paper
Add Code

Sparse and Local Networks for Hypergraph Reasoning

no code implementations • 9 Mar 2023 • Guangxuan Xiao, Leslie Pack Kaelbling, Jiajun Wu, Jiayuan Mao

Reasoning about the relationships between entities from input facts (e. g., whether Ari is a grandparent of Charlie) generally requires explicit consideration of other entities that are not mentioned in the query (e. g., the parents of Charlie).

Knowledge Graphs World Knowledge

Paper
Add Code

DyBit: Dynamic Bit-Precision Numbers for Efficient Quantized Neural Network Inference

no code implementations • 24 Feb 2023 • Jiajun Zhou, Jiajun Wu, Yizhao Gao, Yuhao Ding, Chaofan Tao, Boyu Li, Fengbin Tu, Kwang-Ting Cheng, Hayden Kwok-Hay So, Ngai Wong

To accelerate the inference of deep neural networks (DNNs), quantization with low-bitwidth numbers is actively researched.

Quantization

Paper
Add Code

VQ3D: Learning a 3D-Aware Generative Model on ImageNet

no code implementations • ICCV 2023 • Kyle Sargent, Jing Yu Koh, Han Zhang, Huiwen Chang, Charles Herrmann, Pratul Srinivasan, Jiajun Wu, Deqing Sun

Recent work has shown the possibility of training generative models of 3D content from 2D image collections on small datasets corresponding to a single object class, such as human faces, animal faces, or cars.

Position

Paper
Add Code

FedLE: Federated Learning Client Selection with Lifespan Extension for Edge IoT Networks

no code implementations • 14 Feb 2023 • Jiajun Wu, Steve Drew, Jiayu Zhou

One major challenge preventing the wide adoption of FL in IoT is the pervasive power supply constraints of IoT devices due to the intensive energy consumption of battery-powered clients for local training and model updates.

Federated Learning Privacy Preserving

Paper
Add Code

Topology-aware Federated Learning in Edge Computing: A Comprehensive Survey

no code implementations • 6 Feb 2023 • Jiajun Wu, Steve Drew, Fan Dong, Zhuangdi Zhu, Jiayu Zhou

The ultra-low latency requirements of 5G/6G applications and privacy constraints call for distributed machine learning systems to be deployed at the edge.

Edge-computing Federated Learning

Paper
Add Code

IKEA-Manual: Seeing Shape Assembly Step by Step

no code implementations • 3 Feb 2023 • Ruocheng Wang, Yunzhi Zhang, Jiayuan Mao, Ran Zhang, Chin-Yi Cheng, Jiajun Wu

Human-designed visual manuals are crucial components in shape assembly activities.

3D Assembly Pose Estimation +1

Paper
Add Code

Learning Vortex Dynamics for Fluid Inference and Prediction

no code implementations • 27 Jan 2023 • Yitong Deng, Hong-Xing Yu, Jiajun Wu, Bo Zhu

We propose a novel differentiable vortex particle (DVP) method to infer and predict fluid dynamics from a single video.

Future prediction

Paper
Add Code

Accidental Light Probes

no code implementations • CVPR 2023 • Hong-Xing Yu, Samir Agarwala, Charles Herrmann, Richard Szeliski, Noah Snavely, Jiajun Wu, Deqing Sun

Recovering lighting in a scene from a single image is a fundamental problem in computer vision.

Lighting Estimation

Paper
Add Code

Scene Synthesis from Human Motion

no code implementations • 4 Jan 2023 • Sifan Ye, Yixing Wang, Jiaman Li, Dennis Park, C. Karen Liu, Huazhe Xu, Jiajun Wu

Large-scale capture of human motion with diverse, complex scenes, while immensely useful, is often considered prohibitively costly.

Ranked #3 on 3D Semantic Scene Completion on PRO-teXt

2D Semantic Segmentation task 1 (8 classes) 3D Semantic Scene Completion +1

Paper
Add Code

ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding

1 code implementation • CVPR 2023 • Le Xue, Mingfei Gao, Chen Xing, Roberto Martín-Martín, Jiajun Wu, Caiming Xiong, ran Xu, Juan Carlos Niebles, Silvio Savarese

Then, ULIP learns a 3D representation space aligned with the common image-text space, using a small number of automatically synthesized triplets.

Ranked #3 on Training-free 3D Point Cloud Classification on ModelNet40 (using extra training data)

3D Architecture 3D Classification +5

354

Paper
Code

Physically Plausible Animation of Human Upper Body from a Single Image

no code implementations • 9 Dec 2022 • Ziyuan Huang, Zhengping Zhou, Yung-Yu Chuang, Jiajun Wu, C. Karen Liu

We present a new method for generating controllable, dynamically responsive, and photorealistic human animations.

Paper
Add Code

Ego-Body Pose Estimation via Ego-Head Pose Estimation

1 code implementation • CVPR 2023 • Jiaman Li, C. Karen Liu, Jiajun Wu

In addition, collecting large-scale, high-quality datasets with paired egocentric videos and 3D human motions requires accurate motion capture devices, which often limit the variety of scenes in the videos to lab-like environments.

Benchmarking Disentanglement +1

105

Paper
Code

Seeing a Rose in Five Thousand Ways

no code implementations • CVPR 2023 • Yunzhi Zhang, Shangzhe Wu, Noah Snavely, Jiajun Wu

These instances all share the same intrinsics, but appear different due to a combination of variance within these intrinsics and differences in extrinsic factors, such as pose and illumination.

Image Generation Intrinsic Image Decomposition +1

Paper
Add Code

See, Hear, and Feel: Smart Sensory Fusion for Robotic Manipulation

no code implementations • 7 Dec 2022 • Hao Li, Yizhi Zhang, Junzhe Zhu, Shaoxiong Wang, Michelle A Lee, Huazhe Xu, Edward Adelson, Li Fei-Fei, Ruohan Gao, Jiajun Wu

Humans use all of their senses to accomplish different tasks in everyday activities.

Decision Making

Paper
Add Code

E-MAPP: Efficient Multi-Agent Reinforcement Learning with Parallel Program Guidance

no code implementations • 5 Dec 2022 • Can Chang, Ni Mu, Jiajun Wu, Ling Pan, Huazhe Xu

Specifically, we introduce Efficient Multi-Agent Reinforcement Learning with Parallel Program Guidance(E-MAPP), a novel framework that leverages parallel programs to guide multiple agents to efficiently accomplish goals that require planning over $10+$ stages.

Multi-agent Reinforcement Learning reinforcement-learning +2

Paper
Add Code

Geoclidean: Few-Shot Generalization in Euclidean Geometry

1 code implementation • 30 Nov 2022 • Joy Hsu, Jiajun Wu, Noah D. Goodman

In contrast, low-level and high-level visual features from standard computer vision models pretrained on natural images do not support correct generalization.

Benchmarking

Paper
Code

3D Neural Field Generation using Triplane Diffusion

no code implementations • CVPR 2023 • J. Ryan Shue, Eric Ryan Chan, Ryan Po, Zachary Ankner, Jiajun Wu, Gordon Wetzstein

Diffusion models have emerged as the state-of-the-art for image generation, among other tasks.

3D Generation

Paper
Add Code

STAP: Sequencing Task-Agnostic Policies

no code implementations • 21 Oct 2022 • Christopher Agia, Toki Migimatsu, Jiajun Wu, Jeannette Bohg

We further demonstrate how STAP can be used for task and motion planning by estimating the geometric feasibility of skill sequences provided by a task planner.

Motion Planning Task and Motion Planning

Paper
Add Code

Differentiable Physics Simulation of Dynamics-Augmented Neural Objects

no code implementations • 17 Oct 2022 • Simon Le Cleac'h, Hong-Xing Yu, Michelle Guo, Taylor A. Howell, Ruohan Gao, Jiajun Wu, Zachary Manchester, Mac Schwager

A robot can use this simulation to optimize grasps and manipulation trajectories of neural objects, or to improve the neural object models through gradient-based real-to-simulation transfer.

Friction Object

Paper
Add Code

Retrospectives on the Embodied AI Workshop

no code implementations • 13 Oct 2022 • Matt Deitke, Dhruv Batra, Yonatan Bisk, Tommaso Campari, Angel X. Chang, Devendra Singh Chaplot, Changan Chen, Claudia Pérez D'Arpino, Kiana Ehsani, Ali Farhadi, Li Fei-Fei, Anthony Francis, Chuang Gan, Kristen Grauman, David Hall, Winson Han, Unnat Jain, Aniruddha Kembhavi, Jacob Krantz, Stefan Lee, Chengshu Li, Sagnik Majumder, Oleksandr Maksymets, Roberto Martín-Martín, Roozbeh Mottaghi, Sonia Raychaudhuri, Mike Roberts, Silvio Savarese, Manolis Savva, Mohit Shridhar, Niko Sünderhauf, Andrew Szot, Ben Talbot, Joshua B. Tenenbaum, Jesse Thomason, Alexander Toshev, Joanne Truong, Luca Weihs, Jiajun Wu

We present a retrospective on the state of Embodied AI research.

Visual Navigation

Paper
Add Code

Interaction Modeling with Multiplex Attention

no code implementations • 23 Aug 2022 • Fan-Yun Sun, Isaac Kauvar, Ruohan Zhang, Jiachen Li, Mykel Kochenderfer, Jiajun Wu, Nick Haber

Modeling multi-agent systems requires understanding how agents interact.

Social Navigation Trajectory Forecasting +1

Paper
Add Code

Translating a Visual LEGO Manual to a Machine-Executable Plan

no code implementations • 25 Jul 2022 • Ruocheng Wang, Yunzhi Zhang, Jiayuan Mao, Chin-Yi Cheng, Jiajun Wu

We study the problem of translating an image-based, step-by-step assembly manual created by human designers into machine-interpretable instructions.

3D Pose Estimation Keypoint Detection

Paper
Add Code

Programmatic Concept Learning for Human Motion Description and Synthesis

no code implementations • CVPR 2022 • Sumith Kulal, Jiayuan Mao, Alex Aiken, Jiajun Wu

We introduce Programmatic Motion Concepts, a hierarchical motion representation for human actions that captures both low-level motion and high-level description as motion concepts.

Paper
Add Code

MaskViT: Masked Visual Pre-Training for Video Prediction

no code implementations • 23 Jun 2022 • Agrim Gupta, Stephen Tian, Yunzhi Zhang, Jiajun Wu, Roberto Martín-Martín, Li Fei-Fei

This work shows that we can create good video prediction models by pre-training transformers via masked visual modeling.

Scheduling Video Prediction

Paper
Add Code

BEHAVIOR in Habitat 2.0: Simulator-Independent Logical Task Description for Benchmarking Embodied AI Agents

no code implementations • 13 Jun 2022 • Ziang Liu, Roberto Martín-Martín, Fei Xia, Jiajun Wu, Li Fei-Fei

Robots excel in performing repetitive and precision-sensitive tasks in controlled environments such as warehouses and factories, but have not been yet extended to embodied AI agents providing assistance in household tasks.

Benchmarking

Paper
Add Code

Revisiting the "Video" in Video-Language Understanding

1 code implementation • CVPR 2022 • Shyamal Buch, Cristóbal Eyzaguirre, Adrien Gaidon, Jiajun Wu, Li Fei-Fei, Juan Carlos Niebles

Building on recent progress in self-supervised image-language models, we revisit this question in the context of video and language tasks.

Ranked #1 on Video Question Answering on MSR-VTT-MC

Benchmarking Question Answering +4

Paper
Code

Unsupervised Segmentation in Real-World Images via Spelke Object Inference

1 code implementation • 17 May 2022 • Honglin Chen, Rahul Venkatesh, Yoni Friedman, Jiajun Wu, Joshua B. Tenenbaum, Daniel L. K. Yamins, Daniel M. Bear

Self-supervised, category-agnostic segmentation of real-world images is a challenging open problem in computer vision.

Image Segmentation Optical Flow Estimation +2

Paper
Code

Unsupervised Discovery and Composition of Object Light Fields

no code implementations • 8 May 2022 • Cameron Smith, Hong-Xing Yu, Sergey Zakharov, Fredo Durand, Joshua B. Tenenbaum, Jiajun Wu, Vincent Sitzmann

Neural scene representations, both continuous and discrete, have recently emerged as a powerful new paradigm for 3D scene understanding.

Novel View Synthesis Object +1

Paper
Add Code

RoboCraft: Learning to See, Simulate, and Shape Elasto-Plastic Objects with Graph Networks

no code implementations • 5 May 2022 • Haochen Shi, Huazhe Xu, Zhiao Huang, Yunzhu Li, Jiajun Wu

Our learned model-based planning framework is comparable to and sometimes better than human subjects on the tested tasks.

Model Predictive Control

Paper
Add Code

Video Extrapolation in Space and Time

no code implementations • 4 May 2022 • Yunzhi Zhang, Jiajun Wu

Novel view synthesis (NVS) and video prediction (VP) are typically considered disjoint tasks in computer vision.

Novel View Synthesis Video Prediction

Paper
Add Code

Rotationally Equivariant 3D Object Detection

no code implementations • CVPR 2022 • Hong-Xing Yu, Jiajun Wu, Li Yi

To incorporate object-level rotation equivariance into 3D object detectors, we need a mechanism to extract equivariant features with local object-level spatial support while being able to model cross-object context information.

3D Object Detection Autonomous Driving +2

Paper
Add Code

ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer

1 code implementation • CVPR 2022 • Ruohan Gao, Zilin Si, Yen-Yu Chang, Samuel Clarke, Jeannette Bohg, Li Fei-Fei, Wenzhen Yuan, Jiajun Wu

We present ObjectFolder 2. 0, a large-scale, multisensory dataset of common household objects in the form of implicit neural representations that significantly enhances ObjectFolder 1. 0 in three aspects.

Object

146

Paper
Code

Vision-Based Manipulators Need to Also See from Their Hands

no code implementations • ICLR 2022 • Kyle Hsu, Moo Jin Kim, Rafael Rafailov, Jiajun Wu, Chelsea Finn

We study how the choice of visual perspective affects learning and generalization in the context of physical manipulation from raw sensor observations.

Out-of-Distribution Generalization

Paper
Add Code

Grammar-Based Grounded Lexicon Learning

no code implementations • NeurIPS 2021 • Jiayuan Mao, Haoyue Shi, Jiajun Wu, Roger P. Levy, Joshua B. Tenenbaum

We present Grammar-Based Grounded Lexicon Learning (G2L2), a lexicalist approach toward learning a compositional and grounded meaning representation of language from grounded data, such as paired images and texts.

Network Embedding Sentence +1

Paper
Add Code

Universal Controllers with Differentiable Physics for Online System Identification

no code implementations • 29 Sep 2021 • Michelle Guo, Wenhao Yu, Daniel Ho, Jiajun Wu, Yunfei Bai, Karen Liu, Wenlong Lu

In addition, we perform two studies showing that UC-DiffOSI operates well in environments with changing or unknown dynamics.

Domain Adaptation

Paper
Add Code

Efficient Training and Inference of Hypergraph Reasoning Networks

no code implementations • 29 Sep 2021 • Guangxuan Xiao, Leslie Pack Kaelbling, Jiajun Wu, Jiayuan Mao

To leverage the sparsity in hypergraph neural networks, SpaLoc represents the grounding of relationships such as parent and grandparent as sparse tensors and uses neural networks and finite-domain quantification operations to infer new facts based on the input.

Knowledge Graphs Logical Reasoning +1

Paper
Add Code

Learning Rational Skills for Planning from Demonstrations and Instructions

no code implementations • 29 Sep 2021 • Zhezheng Luo, Jiayuan Mao, Jiajun Wu, Tomas Perez, Joshua B. Tenenbaum, Leslie Pack Kaelbling

We present a framework for learning compositional, rational skill models (RatSkills) that support efficient planning and inverse planning for achieving novel goals and recognizing activities.

Paper
Add Code

ObjectFolder: A Dataset of Objects with Implicit Visual, Auditory, and Tactile Representations

no code implementations • 16 Sep 2021 • Ruohan Gao, Yen-Yu Chang, Shivani Mall, Li Fei-Fei, Jiajun Wu

Multisensory object-centric perception, reasoning, and interaction have been a key research topic in recent years.

3D Reconstruction Object +3

Paper
Add Code

On the Opportunities and Risks of Foundation Models

2 code implementations • 16 Aug 2021 • Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel, Noah Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Koh, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, aditi raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, Percy Liang

AI is undergoing a paradigm shift with the rise of models (e. g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks.

Transfer Learning

847

Paper
Code

BEHAVIOR: Benchmark for Everyday Household Activities in Virtual, Interactive, and Ecological Environments

no code implementations • 6 Aug 2021 • Sanjana Srivastava, Chengshu Li, Michael Lingelbach, Roberto Martín-Martín, Fei Xia, Kent Vainio, Zheng Lian, Cem Gokmen, Shyamal Buch, C. Karen Liu, Silvio Savarese, Hyowon Gweon, Jiajun Wu, Li Fei-Fei

We introduce BEHAVIOR, a benchmark for embodied AI with 100 activities in simulation, spanning a range of everyday household chores such as cleaning, maintenance, and food preparation.

Paper
Add Code

iGibson 2.0: Object-Centric Simulation for Robot Learning of Everyday Household Tasks

1 code implementation • 6 Aug 2021 • Chengshu Li, Fei Xia, Roberto Martín-Martín, Michael Lingelbach, Sanjana Srivastava, Bokui Shen, Kent Vainio, Cem Gokmen, Gokul Dharan, Tanish Jain, Andrey Kurenkov, C. Karen Liu, Hyowon Gweon, Jiajun Wu, Li Fei-Fei, Silvio Savarese

We evaluate the new capabilities of iGibson 2. 0 to enable robot learning of novel tasks, in the hope of demonstrating the potential of this new simulator to support new research in embodied AI.

Imitation Learning

604

Paper
Code

SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations

1 code implementation • ICLR 2022 • Chenlin Meng, Yutong He, Yang song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, Stefano Ermon

The key challenge is balancing faithfulness to the user input (e. g., hand-drawn colored strokes) and realism of the synthesized image.

Denoising Image Generation

909

Paper
Code

Unsupervised Discovery of Object Radiance Fields

1 code implementation • ICLR 2022 • Hong-Xing Yu, Leonidas J. Guibas, Jiajun Wu

We study the problem of inferring an object-centric scene representation from a single image, aiming to derive a representation that explains the image formation process, captures the scene's 3D nature, and is learned without supervision.

Novel View Synthesis Object +1

169

Paper
Code

Temporal and Object Quantification Networks

no code implementations • 10 Jun 2021 • Jiayuan Mao, Zhezheng Luo, Chuang Gan, Joshua B. Tenenbaum, Jiajun Wu, Leslie Pack Kaelbling, Tomer D. Ullman

We present Temporal and Object Quantification Networks (TOQ-Nets), a new class of neuro-symbolic networks with a structural bias that enables them to learn to recognize complex relational-temporal events.

Object Temporal Sequences

Paper
Add Code

KeypointDeformer: Unsupervised 3D Keypoint Discovery for Shape Control

no code implementations • CVPR 2021 • Tomas Jakab, Richard Tucker, Ameesh Makadia, Jiajun Wu, Noah Snavely, Angjoo Kanazawa

We cast this as the problem of aligning a source 3D object to a target 3D object from the same object category.

Object

Paper
Add Code

Hierarchical Motion Understanding via Motion Programs

no code implementations • CVPR 2021 • Sumith Kulal, Jiayuan Mao, Alex Aiken, Jiajun Wu

We posit that adding higher-level motion primitives, which can capture natural coarser units of motion such as backswing or follow-through, can be used to improve downstream analysis tasks.

Video Editing Video Prediction

Paper
Add Code

3D Shape Generation and Completion through Point-Voxel Diffusion

1 code implementation • ICCV 2021 • Linqi Zhou, Yilun Du, Jiajun Wu

We propose a novel approach for probabilistic generative modeling of 3D shapes.

3D Shape Generation Denoising

200

Paper
Code

De-rendering the World's Revolutionary Artefacts

1 code implementation • CVPR 2021 • Shangzhe Wu, Ameesh Makadia, Jiajun Wu, Noah Snavely, Richard Tucker, Angjoo Kanazawa

Recent works have shown exciting results in unsupervised image de-rendering -- learning to decompose 3D shape, appearance, and lighting from single-image collections without explicit supervision.

Paper
Code

Repopulating Street Scenes

no code implementations • CVPR 2021 • Yifan Wang, Andrew Liu, Richard Tucker, Jiajun Wu, Brian L. Curless, Steven M. Seitz, Noah Snavely

We present a framework for automatically reconfiguring images of street scenes by populating, depopulating, or repopulating them with objects such as pedestrians or vehicles.

Autonomous Driving

Paper
Add Code

Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning

no code implementations • 30 Mar 2021 • Zhenfang Chen, Jiayuan Mao, Jiajun Wu, Kwan-Yee Kenneth Wong, Joshua B. Tenenbaum, Chuang Gan

We study the problem of dynamic visual reasoning on raw videos.

counterfactual Object +3

Paper
Add Code

Learning Temporal Dynamics from Cycles in Narrated Video

no code implementations • ICCV 2021 • Dave Epstein, Jiajun Wu, Cordelia Schmid, Chen Sun

Learning to model how the world changes as time elapses has proven a challenging problem for the computer vision community.

Paper
Add Code

Grounding Physical Object and Event Concepts Through Dynamic Visual Reasoning

no code implementations • ICLR 2021 • Zhenfang Chen, Jiayuan Mao, Jiajun Wu, Kwan-Yee Kenneth Wong, Joshua B. Tenenbaum, Chuang Gan

We study the problem of dynamic visual reasoning on raw videos.

counterfactual Object +3

Paper
Add Code

Unsupervised Discovery of 3D Physical Objects

no code implementations • ICLR 2021 • Yilun Du, Kevin A. Smith, Tomer Ullman, Joshua B. Tenenbaum, Jiajun Wu

We study the problem of unsupervised physical object discovery.

Object Object Discovery +1

Paper
Add Code

Temporal and Object Quantification Nets

no code implementations • 1 Jan 2021 • Jiayuan Mao, Zhezheng Luo, Chuang Gan, Joshua B. Tenenbaum, Jiajun Wu, Leslie Pack Kaelbling, Tomer Ullman

We aim to learn generalizable representations for complex activities by quantifying over both entities and time, as in “the kicker is behind all the other players,” or “the player controls the ball until it moves toward the goal.” Such a structural inductive bias of object relations, object quantification, and temporal orders will enable the learned representation to generalize to situations with varying numbers of agents, objects, and time courses.

Event Detection Inductive Bias +1

Paper
Add Code

Language-Mediated, Object-Centric Representation Learning

no code implementations • Findings (ACL) 2021 • Ruocheng Wang, Jiayuan Mao, Samuel J. Gershman, Jiajun Wu

These object-centric concepts derived from language facilitate the learning of object-centric representations.

Object Object Discovery +5

Paper
Add Code

Augmenting Policy Learning with Routines Discovered from a Single Demonstration

2 code implementations • 23 Dec 2020 • Zelin Zhao, Chuang Gan, Jiajun Wu, Xiaoxiao Guo, Joshua B. Tenenbaum

Humans can abstract prior knowledge from very little data and use it to boost skill learning.

Atari Games Imitation Learning +2

Paper
Code

Object-Centric Diagnosis of Visual Reasoning

no code implementations • 21 Dec 2020 • Jianwei Yang, Jiayuan Mao, Jiajun Wu, Devi Parikh, David D. Cox, Joshua B. Tenenbaum, Chuang Gan

In contrast, symbolic and modular models have a relatively better grounding and robustness, though at the cost of accuracy.

Object Question Answering +2

Paper
Add Code

Neural Radiance Flow for 4D View Synthesis and Video Processing

1 code implementation • ICCV 2021 • Yilun Du, Yinan Zhang, Hong-Xing Yu, Joshua B. Tenenbaum, Jiajun Wu

We present a method, Neural Radiance Flow (NeRFlow), to learn a 4D spatial-temporal representation of a dynamic scene from a set of RGB images.

Image Super-Resolution Temporal View Synthesis

318

Paper
Code

Object-Centric Neural Scene Rendering

no code implementations • 15 Dec 2020 • Michelle Guo, Alireza Fathi, Jiajun Wu, Thomas Funkhouser

We present a method for composing photorealistic scenes from captured images of objects.

Object

Paper
Add Code

pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis

3 code implementations • CVPR 2021 • Eric R. Chan, Marco Monteiro, Petr Kellnhofer, Jiajun Wu, Gordon Wetzstein

We have witnessed rapid progress on 3D-aware image synthesis, leveraging recent advances in generative visual models and neural rendering.

Ranked #3 on Scene Generation on VizDoom

3D-Aware Image Synthesis Neural Rendering +1

409

Paper
Code

Multi-Plane Program Induction with 3D Box Priors

no code implementations • NeurIPS 2020 • Yikai Li, Jiayuan Mao, Xiuming Zhang, William T. Freeman, Joshua B. Tenenbaum, Noah Snavely, Jiajun Wu

We consider two important aspects in understanding and editing images: modeling regular, program-like texture or patterns in 2D planes, and 3D posing of these planes in the scene.

Program induction Program Synthesis

Paper
Add Code

Learning 3D Dynamic Scene Representations for Robot Manipulation

2 code implementations • 3 Nov 2020 • Zhenjia Xu, Zhanpeng He, Jiajun Wu, Shuran Song

3D scene representation for robot manipulation should capture three key object properties: permanency -- objects that become occluded over time continue to exist; amodal completeness -- objects have 3D occupancy, even if only partial observations are available; spatiotemporal continuity -- the movement of each object is continuous over space and time.

Model Predictive Control Robot Manipulation

Paper
Code

Multi-Frame to Single-Frame: Knowledge Distillation for 3D Object Detection

no code implementations • 24 Sep 2020 • Yue Wang, Alireza Fathi, Jiajun Wu, Thomas Funkhouser, Justin Solomon

A common dilemma in 3D object detection for autonomous driving is that high-quality, dense point clouds are only available during training, but not testing.

3D Object Detection Autonomous Driving +3

Paper
Add Code

Unsupervised Discovery of 3D Physical Objects from Video

no code implementations • 24 Jul 2020 • Yilun Du, Kevin Smith, Tomer Ulman, Joshua Tenenbaum, Jiajun Wu

We study the problem of unsupervised physical object discovery.

Object Object Discovery +1

Paper
Add Code

End-to-End Optimization of Scene Layout

1 code implementation • CVPR 2020 • Andrew Luo, Zhoutong Zhang, Jiajun Wu, Joshua B. Tenenbaum

Experiments suggest that our model achieves higher accuracy and diversity in conditional scene synthesis and allows exemplar-based scene generation from various input forms.

Indoor Scene Reconstruction Indoor Scene Synthesis +2

Paper
Code

Perspective Plane Program Induction from a Single Image

no code implementations • CVPR 2020 • Yikai Li, Jiayuan Mao, Xiuming Zhang, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu

We study the inverse graphics problem of inferring a holistic representation for natural images.

Image Manipulation Pose Estimation +2

Paper
Add Code

Learning Physical Graph Representations from Visual Scenes

1 code implementation • NeurIPS 2020 • Daniel M. Bear, Chaofei Fan, Damian Mrowca, Yunzhu Li, Seth Alter, Aran Nayebi, Jeremy Schwartz, Li Fei-Fei, Jiajun Wu, Joshua B. Tenenbaum, Daniel L. K. Yamins

To overcome these limitations, we introduce the idea of Physical Scene Graphs (PSGs), which represent scenes as hierarchical graphs, with nodes in the hierarchy corresponding intuitively to object parts at different scales, and edges to physical connections between parts.

Object Object Categorization +1

Paper
Code

When is Particle Filtering Efficient for Planning in Partially Observed Linear Dynamical Systems?

no code implementations • 10 Jun 2020 • Simon S. Du, Wei Hu, Zhiyuan Li, Ruoqi Shen, Zhao Song, Jiajun Wu

Though errors in past actions may affect the future, we are able to bound the number of particles needed so that the long-run reward of the policy based on particle filtering is close to that based on exact inference.

Decision Making

Paper
Add Code

Improving Learning Efficiency for Wireless Resource Allocation with Symmetric Prior

no code implementations • 18 May 2020 • Chengjian Sun, Jiajun Wu, Chenyang Yang

The samples required to train a DNN after ranking can be reduced by $15 \sim 2, 400$ folds to achieve the same system performance as the counterpart without using prior.

Paper
Add Code

Deep Audio Priors Emerge From Harmonic Convolutional Networks

no code implementations • ICLR 2020 • Zhoutong Zhang, Yunyun Wang, Chuang Gan, Jiajun Wu, Joshua B. Tenenbaum, Antonio Torralba, William T. Freeman

We show that networks using Harmonic Convolution can reliably model audio priors and achieve high performance in unsupervised audio restoration tasks.

Paper
Add Code

Visual Grounding of Learned Physical Models

1 code implementation • ICML 2020 • Yunzhu Li, Toru Lin, Kexin Yi, Daniel M. Bear, Daniel L. K. Yamins, Jiajun Wu, Joshua B. Tenenbaum, Antonio Torralba

The abilities to perform physical reasoning and to adapt to new environments, while intrinsic to humans, remain challenging to state-of-the-art computational models.

Visual Grounding

Paper
Code

Visual Concept-Metaconcept Learning

1 code implementation • NeurIPS 2019 • Chi Han, Jiayuan Mao, Chuang Gan, Joshua B. Tenenbaum, Jiajun Wu

Humans reason with concepts and metaconcepts: we recognize red and green from visual input; we also understand that they describe the same property of objects (i. e., the color).

Paper
Code

Look, Listen, and Act: Towards Audio-Visual Embodied Navigation

1 code implementation • 25 Dec 2019 • Chuang Gan, Yiwei Zhang, Jiajun Wu, Boqing Gong, Joshua B. Tenenbaum

In this paper, we attempt to approach the problem of Audio-Visual Embodied Navigation, the task of planning the shortest path from a random starting location in a scene to the sound source in an indoor environment, given only raw egocentric visual and audio sensory data.

Navigate

Paper
Code

Modeling Expectation Violation in Intuitive Physics with Coarse Probabilistic Object Representations

1 code implementation • NeurIPS 2019 • Kevin Smith, Lingjie Mei, Shunyu Yao, Jiajun Wu, Elizabeth Spelke, Josh Tenenbaum, Tomer Ullman

We also present a new test set for measuring violations of physical expectations, using a range of scenarios derived from developmental psychology.

Scene Understanding

Paper
Code

Accurate Vision-based Manipulation through Contact Reasoning

no code implementations • 8 Nov 2019 • Alina Kloss, Maria Bauza, Jiajun Wu, Joshua B. Tenenbaum, Alberto Rodriguez, Jeannette Bohg

Planning contact interactions is one of the core challenges of many robotic tasks.

Paper
Add Code

Proactive Optimization with Machine Learning: Femto-caching with Future Content Popularity

no code implementations • 29 Oct 2019 • Jiajun Wu, Chengjian Sun, Chenyang Yang

In this paper, we introduce a proactive optimization framework for anticipatory resource allocation, where the future information is implicitly predicted under the same objective with the policy optimization in a single step.

BIG-bench Machine Learning Stochastic Optimization

Paper
Add Code

Entity Abstraction in Visual Model-Based Reinforcement Learning

1 code implementation • 28 Oct 2019 • Rishi Veerapaneni, John D. Co-Reyes, Michael Chang, Michael Janner, Chelsea Finn, Jiajun Wu, Joshua B. Tenenbaum, Sergey Levine

This paper tests the hypothesis that modeling a scene in terms of entities and their local interactions, as opposed to modeling the scene globally, provides a significant benefit in generalizing to physical tasks in a combinatorial space the learner has not encountered before.

Model-based Reinforcement Learning Object +5

Paper
Code

Learning Compositional Koopman Operators for Model-Based Control

no code implementations • ICLR 2020 • Yunzhu Li, Hao He, Jiajun Wu, Dina Katabi, Antonio Torralba

Finding an embedding space for a linear approximation of a nonlinear dynamical system enables efficient system identification and control synthesis.

Paper
Add Code

CLEVRER: CoLlision Events for Video REpresentation and Reasoning

3 code implementations • ICLR 2020 • Kexin Yi, Chuang Gan, Yunzhu Li, Pushmeet Kohli, Jiajun Wu, Antonio Torralba, Joshua B. Tenenbaum

While these models thrive on the perception-based task (descriptive), they perform poorly on the causal tasks (explanatory, predictive and counterfactual), suggesting that a principled approach for causal reasoning should incorporate the capability of both perceiving complex visual and language inputs, and understanding the underlying dynamics and causal relations.

counterfactual Descriptive +1

102

Paper
Code

DualSMC: Tunneling Differentiable Filtering and Planning under Continuous POMDPs

1 code implementation • 28 Sep 2019 • Yunbo Wang, Bo Liu, Jiajun Wu, Yuke Zhu, Simon S. Du, Li Fei-Fei, Joshua B. Tenenbaum

A major difficulty of solving continuous POMDPs is to infer the multi-modal distribution of the unobserved true states and to make the planning algorithm dependent on the perceived uncertainty.

Continuous Control

Paper
Code

Program-Guided Image Manipulators

no code implementations • ICCV 2019 • Jiayuan Mao, Xiuming Zhang, Yikai Li, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu

Humans are capable of building holistic representations for images at various levels, from local objects, to pairwise relations, to global structures.

Image Inpainting

Paper
Add Code

Neurally-Guided Structure Inference

no code implementations • 17 Jun 2019 • Sidi Lu, Jiayuan Mao, Joshua B. Tenenbaum, Jiajun Wu

In this paper, we propose a hybrid inference algorithm, the Neurally-Guided Structure Inference (NG-SI), keeping the advantages of both search-based and data-driven methods.

Paper
Add Code

DensePhysNet: Learning Dense Physical Object Representations via Multi-step Dynamic Interactions

no code implementations • 10 Jun 2019 • Zhenjia Xu, Jiajun Wu, Andy Zeng, Joshua B. Tenenbaum, Shuran Song

We study the problem of learning physical object representations for robot manipulation.

Friction Object +1

Paper
Add Code

Modeling Parts, Structure, and System Dynamics via Predictive Learning

no code implementations • ICLR 2019 • Zhenjia Xu, Zhijian Liu, Chen Sun, Kevin Murphy, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu

Humans easily recognize object parts and their hierarchical structure by watching how they move; they can then predict how each part moves in the future.

Object

Paper
Add Code

Learning to Describe Scenes with Programs

no code implementations • ICLR 2019 • Yunchao Liu, Zheng Wu, Daniel Ritchie, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu

We are able to understand the higher-level, abstract regularities within the scene such as symmetry and repetition.

Paper
Add Code

Reasoning About Physical Interactions with Object-Centric Models

no code implementations • ICLR 2019 • Michael Janner, Sergey Levine, William T. Freeman, Joshua B. Tenenbaum, Chelsea Finn, Jiajun Wu

Object-based factorizations provide a useful level of abstraction for interacting with the world.

Object Scene Understanding

Paper
Add Code

Predicting the Present and Future States of Multi-agent Systems from Partially-observed Visual Data

no code implementations • ICLR 2019 • Chen Sun, Per Karlsson, Jiajun Wu, Joshua B. Tenenbaum, Kevin Murphy

We present a method which learns to integrate temporal information, from a learned dynamics model, with ambiguous visual information, from a learned vision model, in the context of interacting agents.

Paper
Add Code

The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision

2 code implementations • ICLR 2019 • Jiayuan Mao, Chuang Gan, Pushmeet Kohli, Joshua B. Tenenbaum, Jiajun Wu

To bridge the learning of two modules, we use a neuro-symbolic reasoning module that executes these programs on the latent scene representation.

Ranked #5 on Visual Question Answering (VQA) on CLEVR

Object Question Answering +4

407

Paper
Code

Combining Physical Simulators and Object-Based Networks for Control

no code implementations • 13 Apr 2019 • Anurag Ajay, Maria Bauza, Jiajun Wu, Nima Fazeli, Joshua B. Tenenbaum, Alberto Rodriguez, Leslie P. Kaelbling

Physics engines play an important role in robot planning and control; however, many real-world control problems involve complex contact dynamics that cannot be characterized analytically.

Object

Paper
Add Code

Unsupervised Discovery of Parts, Structure, and Dynamics

no code implementations • 12 Mar 2019 • Zhenjia Xu, Zhijian Liu, Chen Sun, Kevin Murphy, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu

Humans easily recognize object parts and their hierarchical structure by watching how they move; they can then predict how each part moves in the future.

Object

Paper
Add Code

Stochastic Prediction of Multi-Agent Interactions from Partial Observations

no code implementations • 25 Feb 2019 • Chen Sun, Per Karlsson, Jiajun Wu, Joshua B. Tenenbaum, Kevin Murphy

We present a method that learns to integrate temporal information, from a learned dynamics model, with ambiguous visual information, from a learned vision model, in the context of interacting agents.

Paper
Add Code

Learning to Infer and Execute 3D Shape Programs

no code implementations • ICLR 2019 • Yonglong Tian, Andrew Luo, Xingyuan Sun, Kevin Ellis, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu

Human perception of 3D shapes goes beyond reconstructing them as a set of points or a composition of geometric primitives: we also effortlessly understand higher-level shape structure such as the repetition and reflective symmetry of object parts.

Paper
Add Code

Reasoning About Physical Interactions with Object-Oriented Prediction and Planning

no code implementations • 28 Dec 2018 • Michael Janner, Sergey Levine, William T. Freeman, Joshua B. Tenenbaum, Chelsea Finn, Jiajun Wu

Object-based factorizations provide a useful level of abstraction for interacting with the world.

Object Scene Understanding

Paper
Add Code

Learning to Reconstruct Shapes from Unseen Classes

no code implementations • NeurIPS 2018 • Xiuming Zhang, Zhoutong Zhang, Chengkai Zhang, Joshua B. Tenenbaum, William T. Freeman, Jiajun Wu

From a single image, humans are able to perceive the full 3D shape of an object by exploiting learned shape priors from everyday life.

3D Reconstruction

Paper
Add Code

Visual Object Networks: Image Generation with Disentangled 3D Representation

1 code implementation • NeurIPS 2018 • Jun-Yan Zhu, Zhoutong Zhang, Chengkai Zhang, Jiajun Wu, Antonio Torralba, Joshua B. Tenenbaum, William T. Freeman

Our model first learns to synthesize 3D shapes that are indistinguishable from real shapes.

Image Generation Object

532

Paper
Code

Learning to Exploit Stability for 3D Scene Parsing

no code implementations • NeurIPS 2018 • Yilun Du, Zhijian Liu, Hector Basevi, Ales Leonardis, Bill Freeman, Josh Tenenbaum, Jiajun Wu

We first show that applying physics supervision to an existing scene understanding model increases performance, produces more stable predictions, and allows training to an equivalent performance level with fewer annotated training examples.

Scene Understanding Translation

Paper
Add Code

Visual Object Networks: Image Generation with Disentangled 3D Representations

1 code implementation • NeurIPS 2018 • Jun-Yan Zhu, Zhoutong Zhang, Chengkai Zhang, Jiajun Wu, Antonio Torralba, Josh Tenenbaum, Bill Freeman

The VON not only generates images that are more realistic than the state-of-the-art 2D image synthesis methods but also enables many 3D operations such as changing the viewpoint of a generated image, shape and texture editing, linear interpolation in texture and shape space, and transferring appearance across different objects and viewpoints.

Image Generation Object

532

Paper
Code

Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding

2 code implementations • NeurIPS 2018 • Kexin Yi, Jiajun Wu, Chuang Gan, Antonio Torralba, Pushmeet Kohli, Joshua B. Tenenbaum

Second, the model is more data- and memory-efficient: it performs well after learning on a small number of training data; it can also encode an image into a compact representation, requiring less storage than existing methods for offline question answering.

Ranked #1 on Visual Question Answering (VQA) on CLEVR

Question Answering Representation Learning +1

254

Paper
Code

Learning Particle Dynamics for Manipulating Rigid Bodies, Deformable Objects, and Fluids

no code implementations • ICLR 2019 • Yunzhu Li, Jiajun Wu, Russ Tedrake, Joshua B. Tenenbaum, Antonio Torralba

In this paper, we propose to learn a particle-based simulator for complex control tasks.

Inductive Bias

Paper
Add Code

ChainQueen: A Real-Time Differentiable Physical Simulator for Soft Robotics

no code implementations • 2 Oct 2018 • Yuanming Hu, Jian-Cheng Liu, Andrew Spielberg, Joshua B. Tenenbaum, William T. Freeman, Jiajun Wu, Daniela Rus, Wojciech Matusik

The underlying physical laws of deformable objects are more complex, and the resulting systems have orders of magnitude more degrees of freedom and therefore they are significantly more computationally expensive to simulate.

Motion Planning

Paper
Add Code

Propagation Networks for Model-Based Control Under Partial Observation

1 code implementation • 28 Sep 2018 • Yunzhu Li, Jiajun Wu, Jun-Yan Zhu, Joshua B. Tenenbaum, Antonio Torralba, Russ Tedrake

There has been an increasing interest in learning dynamics simulators for model-based control.

Paper
Code

MoSculp: Interactive Visualization of Shape and Time

no code implementations • 14 Sep 2018 • Xiuming Zhang, Tali Dekel, Tianfan Xue, Andrew Owens, Qiurui He, Jiajun Wu, Stefanie Mueller, William T. Freeman

We present a system that allows users to visualize complex human motion via 3D motion sculptures---a representation that conveys the 3D structure swept by a human body as it moves through space.

Paper
Add Code

Seeing Tree Structure from Vibration

no code implementations • ECCV 2018 • Tianfan Xue, Jiajun Wu, Zhoutong Zhang, Chengkai Zhang, Joshua B. Tenenbaum, William T. Freeman

Humans recognize object structure from both their appearance and motion; often, motion helps to resolve ambiguities in object structure that arise when we observe object appearance only.

Bayesian Inference Object

Paper
Add Code

Learning Shape Priors for Single-View 3D Completion and Reconstruction

no code implementations • ECCV 2018 • Jiajun Wu, Chengkai Zhang, Xiuming Zhang, Zhoutong Zhang, William T. Freeman, Joshua B. Tenenbaum

The problem of single-view 3D shape completion or reconstruction is challenging, because among the many possible shapes that explain an observation, most are implausible and do not correspond to natural objects.

Paper
Add Code

Physical Primitive Decomposition

no code implementations • ECCV 2018 • Zhijian Liu, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu

As annotated data for object parts and physics are rare, we propose a novel formulation that learns physical primitives by explaining both an object's appearance and its behaviors in physical events.

Object

Paper
Add Code

3D-Aware Scene Manipulation via Inverse Graphics

1 code implementation • NeurIPS 2018 • Shunyu Yao, Tzu Ming Harry Hsu, Jun-Yan Zhu, Jiajun Wu, Antonio Torralba, William T. Freeman, Joshua B. Tenenbaum

In this work, we propose 3D scene de-rendering networks (3D-SDN) to address the above issues by integrating disentangled representations for semantics, geometry, and appearance into a deep generative model.

Disentanglement Object

266

Paper
Code

3D Shape Perception from Monocular Vision, Touch, and Shape Priors

no code implementations • 9 Aug 2018 • Shaoxiong Wang, Jiajun Wu, Xingyuan Sun, Wenzhen Yuan, William T. Freeman, Joshua B. Tenenbaum, Edward H. Adelson

Perceiving accurate 3D object shape is important for robots to interact with the physical world.

Object

Paper
Add Code

Augmenting Physical Simulators with Stochastic Neural Networks: Case Study of Planar Pushing and Bouncing

no code implementations • 9 Aug 2018 • Anurag Ajay, Jiajun Wu, Nima Fazeli, Maria Bauza, Leslie P. Kaelbling, Joshua B. Tenenbaum, Alberto Rodriguez

An efficient, generalizable physical simulator with universal uncertainty estimates has wide applications in robot state estimation, planning, and control.

Gaussian Processes Object

Paper
Add Code

Unsupervised Learning of Latent Physical Properties Using Perception-Prediction Networks

no code implementations • 24 Jul 2018 • David Zheng, Vinson Luo, Jiajun Wu, Joshua B. Tenenbaum

We propose a framework for the completely unsupervised learning of latent object properties from their interactions: the perception-prediction network (PPN).

Object

Paper
Add Code

Visual Dynamics: Stochastic Future Generation via Layered Cross Convolutional Networks

no code implementations • 24 Jul 2018 • Tianfan Xue, Jiajun Wu, Katherine L. Bouman, William T. Freeman

We study the problem of synthesizing a number of likely future frames from a single input image.

Paper
Add Code

Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling

1 code implementation • CVPR 2018 • Xingyuan Sun, Jiajun Wu, Xiuming Zhang, Zhoutong Zhang, Chengkai Zhang, Tianfan Xue, Joshua B. Tenenbaum, William T. Freeman

We study 3D shape modeling from a single image and make contributions to it in three aspects.

Ranked #1 on 3D Shape Classification on Pix3D

3D Reconstruction 3D Shape Modeling +5

485

Paper
Code

3D Interpreter Networks for Viewer-Centered Wireframe Modeling

no code implementations • 3 Apr 2018 • Jiajun Wu, Tianfan Xue, Joseph J. Lim, Yuandong Tian, Joshua B. Tenenbaum, Antonio Torralba, William T. Freeman

3D-INN is trained on real images to estimate 2D keypoint heatmaps from an input image; it then predicts 3D object structure from heatmaps using knowledge learned from synthetic 3D shapes.

Image Retrieval Keypoint Estimation +2

Paper
Add Code

Learning Sight from Sound: Ambient Sound Provides Supervision for Visual Learning

no code implementations • 20 Dec 2017 • Andrew Owens, Jiajun Wu, Josh H. McDermott, William T. Freeman, Antonio Torralba

The sound of crashing waves, the roar of fast-moving cars -- sound conveys important information about the objects in our surroundings.

Paper
Add Code

Learning to See Physics via Visual De-animation

no code implementations • NeurIPS 2017 • Jiajun Wu, Erika Lu, Pushmeet Kohli, Bill Freeman, Josh Tenenbaum

At the core of our system is a physical world representation that is first recovered by a perception module and then utilized by physics and graphics engines.

Future prediction

Paper
Add Code

Shape and Material from Sound

no code implementations • NeurIPS 2017 • Zhoutong Zhang, Qiujia Li, Zhengjia Huang, Jiajun Wu, Josh Tenenbaum, Bill Freeman

Hearing an object falling onto the ground, humans can recover rich information including its rough shape, material, and falling height.

Object

Paper
Add Code

Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification

5 code implementations • CVPR 2018 • Xiang Long, Chuang Gan, Gerard de Melo, Jiajun Wu, Xiao Liu, Shilei Wen

In this paper, however, we show that temporal information, especially longer-term patterns, may not be necessary to achieve competitive results on common video classification datasets.

Classification General Classification +1

334

Paper
Code

Video Enhancement with Task-Oriented Flow

4 code implementations • 24 Nov 2017 • Tianfan Xue, Baian Chen, Jiajun Wu, Donglai Wei, William T. Freeman

Many video enhancement algorithms rely on optical flow to register frames in a video sequence.

Ranked #7 on Video Frame Interpolation on Middlebury

Denoising Motion Estimation +5

427

Paper
Code

Self-Supervised Intrinsic Image Decomposition

no code implementations • NeurIPS 2017 • Michael Janner, Jiajun Wu, Tejas D. Kulkarni, Ilker Yildirim, Joshua B. Tenenbaum

Intrinsic decomposition from a single image is a highly challenging task, due to its inherent ambiguity and the scarcity of training data.

Intrinsic Image Decomposition Transfer Learning

Paper
Add Code

MarrNet: 3D Shape Reconstruction via 2.5D Sketches

no code implementations • NeurIPS 2017 • Jiajun Wu, Yifan Wang, Tianfan Xue, Xingyuan Sun, William T. Freeman, Joshua B. Tenenbaum

First, compared to full 3D shape, 2. 5D sketches are much easier to be recovered from a 2D image; models that recover 2. 5D sketches are also more likely to transfer from synthetic to real data.

Ranked #2 on 3D Shape Classification on Pix3D

3D Object Reconstruction From A Single Image 3D Reconstruction +3

Paper
Add Code

Raster-To-Vector: Revisiting Floorplan Transformation

1 code implementation • ICCV 2017 • Chen Liu, Jiajun Wu, Pushmeet Kohli, Yasutaka Furukawa

A neural architecture first transforms a rasterized image to a set of junctions that represent low-level geometric and semantic information (e. g., wall corners or door end-points).

Vector Graphics

457

Paper
Code

Generative Modeling of Audible Shapes for Object Perception

no code implementations • ICCV 2017 • Zhoutong Zhang, Jiajun Wu, Qiujia Li, Zhengjia Huang, James Traer, Josh H. McDermott, Joshua B. Tenenbaum, William T. Freeman

Humans infer rich knowledge of objects from both auditory and visual cues.

Object

Paper
Add Code

Synthesizing 3D Shapes via Modeling Multi-View Depth Maps and Silhouettes With Deep Generative Networks

no code implementations • CVPR 2017 • Amir Arsalan Soltani, Haibin Huang, Jiajun Wu, Tejas D. Kulkarni, Joshua B. Tenenbaum

We take an alternative approach: learning a generative model over multi-view depth maps or their corresponding silhouettes, and using a deterministic rendering function to produce 3D shapes from these images.

Paper
Add Code

Neural Scene De-Rendering

no code implementations • CVPR 2017 • Jiajun Wu, Joshua B. Tenenbaum, Pushmeet Kohli

Our approach employs a deterministic rendering function as the decoder, mapping a naturally structured and disentangled scene description, which we named scene XML, to an image.

Image Captioning Scene Understanding

Paper
Add Code

Deep Multi-Modal Image Correspondence Learning

no code implementations • 5 Dec 2016 • Chen Liu, Jiajun Wu, Pushmeet Kohli, Yasutaka Furukawa

Our result implies that neural networks are effective at perceptual tasks that require long periods of reasoning even for humans to solve.

Paper
Add Code

Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling

3 code implementations • NeurIPS 2016 • Jiajun Wu, Chengkai Zhang, Tianfan Xue, William T. Freeman, Joshua B. Tenenbaum

We study the problem of 3D object generation.

Ranked #3 on 3D Shape Classification on Pix3D

3D Object Recognition 3D Point Cloud Linear Classification +3

813

Paper
Code

Ambient Sound Provides Supervision for Visual Learning

1 code implementation • 25 Aug 2016 • Andrew Owens, Jiajun Wu, Josh H. McDermott, William T. Freeman, Antonio Torralba

We show that, through this process, the network learns a representation that conveys information about objects and scenes.

Object Recognition

Paper
Code

Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks

3 code implementations • NeurIPS 2016 • Tianfan Xue, Jiajun Wu, Katherine L. Bouman, William T. Freeman

We study the problem of synthesizing a number of likely future frames from a single input image.

76,594

Paper
Code

A Comparative Evaluation of Approximate Probabilistic Simulation and Deep Neural Networks as Accounts of Human Physical Scene Understanding

no code implementations • 4 May 2016 • Renqiao Zhang, Jiajun Wu, Chengkai Zhang, William T. Freeman, Joshua B. Tenenbaum

Humans demonstrate remarkable abilities to predict physical events in complex scenes.

Scene Understanding

Paper
Add Code

Single Image 3D Interpreter Network

1 code implementation • 29 Apr 2016 • Jiajun Wu, Tianfan Xue, Joseph J. Lim, Yuandong Tian, Joshua B. Tenenbaum, Antonio Torralba, William T. Freeman

In this work, we propose 3D INterpreter Network (3D-INN), an end-to-end framework which sequentially estimates 2D keypoint heatmaps and 3D object structure, trained on both real 2D-annotated images and synthetic 3D data.

Image Retrieval Keypoint Estimation +2

Paper
Code

Galileo: Perceiving Physical Object Properties by Integrating a Physics Engine with Deep Learning

no code implementations • NeurIPS 2015 • Jiajun Wu, Ilker Yildirim, Joseph J. Lim, Bill Freeman, Josh Tenenbaum

Humans demonstrate remarkable abilities to predict physical events in dynamic scenes, and to infer the physical properties of objects from static images.

Friction Scene Understanding

Paper
Add Code

Deep Multiple Instance Learning for Image Classification and Auto-Annotation

no code implementations • CVPR 2015 • Jiajun Wu, Yinan Yu, Chang Huang, Kai Yu

The recent development in learning deep representations has demonstrated its wide applications in traditional vision tasks like classification and detection.

Classification General Classification +3

Paper
Add Code

MILCut: A Sweeping Line Multiple Instance Learning Paradigm for Interactive Image Segmentation

no code implementations • CVPR 2014 • Jiajun Wu, Yibiao Zhao, Jun-Yan Zhu, Siwei Luo, Zhuowen Tu

Interactive segmentation, in which a user provides a bounding box to an object of interest for image segmentation, has been applied to a variety of applications in image editing, crowdsourcing, computer vision, and medical imaging.

Image Segmentation Interactive Segmentation +4

Paper
Add Code

Harvesting Mid-level Visual Concepts from Large-Scale Internet Images

no code implementations • CVPR 2013 • Quannan Li, Jiajun Wu, Zhuowen Tu

Obtaining effective mid-level representations has become an increasingly important task in computer vision.

Image Classification

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.