no code implementations • 10 Mar 2023 • Hong-Xing Yu, Michelle Guo, Alireza Fathi, Yen-Yu Chang, Eric Ryan Chan, Ruohan Gao, Thomas Funkhouser, Jiajun Wu
We propose Object-Centric Neural Scattering Functions (OSFs) for learning to reconstruct object appearance from only images.
no code implementations • 9 Mar 2023 • Zhezheng Luo, Jiayuan Mao, Jiajun Wu, Tomás Lozano-Pérez, Joshua B. Tenenbaum, Leslie Pack Kaelbling
We present a framework for learning useful subgoals that support efficient long-term planning to achieve novel goals.
no code implementations • 9 Mar 2023 • Guangxuan Xiao, Leslie Pack Kaelbling, Jiajun Wu, Jiayuan Mao
Reasoning about the relationships between entities from input facts (e. g., whether Ari is a grandparent of Charlie) generally requires explicit consideration of other entities that are not mentioned in the query (e. g., the parents of Charlie).
no code implementations • 24 Feb 2023 • Jiajun Zhou, Jiajun Wu, Yizhao Gao, Yuhao Ding, Chaofan Tao, Boyu Li, Fengbin Tu, Kwang-Ting Cheng, Hayden Kwok-Hay So, Ngai Wong
To accelerate the inference of deep neural networks (DNNs), quantization with low-bitwidth numbers is actively researched.
no code implementations • 14 Feb 2023 • Kyle Sargent, Jing Yu Koh, Han Zhang, Huiwen Chang, Charles Herrmann, Pratul Srinivasan, Jiajun Wu, Deqing Sun
Recent work has shown the possibility of training generative models of 3D content from 2D image collections on small datasets corresponding to a single object class, such as human faces, animal faces, or cars.
no code implementations • 14 Feb 2023 • Jiajun Wu, Steve Drew, Jiayu Zhou
One major challenge preventing the wide adoption of FL in IoT is the pervasive power supply constraints of IoT devices due to the intensive energy consumption of battery-powered clients for local training and model updates.
no code implementations • 6 Feb 2023 • Jiajun Wu, Steve Drew, Fan Dong, Zhuangdi Zhu, Jiayu Zhou
The ultra-low latency requirements of 5G/6G applications and privacy constraints call for distributed machine learning systems to be deployed at the edge.
no code implementations • 3 Feb 2023 • Ruocheng Wang, Yunzhi Zhang, Jiayuan Mao, Ran Zhang, Chin-Yi Cheng, Jiajun Wu
Human-designed visual manuals are crucial components in shape assembly activities.
no code implementations • 27 Jan 2023 • Yitong Deng, Hong-Xing Yu, Jiajun Wu, Bo Zhu
We propose a novel differentiable vortex particle (DVP) method to infer and predict fluid dynamics from a single video.
no code implementations • 12 Jan 2023 • Hong-Xing Yu, Samir Agarwala, Charles Herrmann, Richard Szeliski, Noah Snavely, Jiajun Wu, Deqing Sun
Recovering lighting in a scene from a single image is a fundamental problem in computer vision.
no code implementations • 4 Jan 2023 • Sifan Ye, Yixing Wang, Jiaman Li, Dennis Park, C. Karen Liu, Huazhe Xu, Jiajun Wu
Large-scale capture of human motion with diverse, complex scenes, while immensely useful, is often considered prohibitively costly.
1 code implementation • 10 Dec 2022 • Le Xue, Mingfei Gao, Chen Xing, Roberto Martín-Martín, Jiajun Wu, Caiming Xiong, ran Xu, Juan Carlos Niebles, Silvio Savarese
Then, ULIP learns a 3D representation space aligned with the common image-text space, using a small number of automatically synthesized triplets.
Ranked #2 on
3D Point Cloud Classification
on ModelNet40
(using extra training data)
no code implementations • 9 Dec 2022 • Yunzhi Zhang, Shangzhe Wu, Noah Snavely, Jiajun Wu
These instances all share the same intrinsics, but appear different due to a combination of variance within these intrinsics and differences in extrinsic factors, such as pose and illumination.
no code implementations • 9 Dec 2022 • Jiaman Li, C. Karen Liu, Jiajun Wu
In addition, collecting large-scale, high-quality datasets with paired egocentric videos and 3D human motions requires accurate motion capture devices, which often limit the variety of scenes in the videos to lab-like environments.
no code implementations • 9 Dec 2022 • Ziyuan Huang, Zhengping Zhou, Yung-Yu Chuang, Jiajun Wu, C. Karen Liu
We present a new method for generating controllable, dynamically responsive, and photorealistic human animations.
no code implementations • 7 Dec 2022 • Hao Li, Yizhi Zhang, Junzhe Zhu, Shaoxiong Wang, Michelle A Lee, Huazhe Xu, Edward Adelson, Li Fei-Fei, Ruohan Gao, Jiajun Wu
Humans use all of their senses to accomplish different tasks in everyday activities.
no code implementations • 5 Dec 2022 • Can Chang, Ni Mu, Jiajun Wu, Ling Pan, Huazhe Xu
Specifically, we introduce Efficient Multi-Agent Reinforcement Learning with Parallel Program Guidance(E-MAPP), a novel framework that leverages parallel programs to guide multiple agents to efficiently accomplish goals that require planning over $10+$ stages.
Multi-agent Reinforcement Learning
reinforcement-learning
+1
1 code implementation • 30 Nov 2022 • Joy Hsu, Jiajun Wu, Noah D. Goodman
In contrast, low-level and high-level visual features from standard computer vision models pretrained on natural images do not support correct generalization.
no code implementations • 30 Nov 2022 • J. Ryan Shue, Eric Ryan Chan, Ryan Po, Zachary Ankner, Jiajun Wu, Gordon Wetzstein
Diffusion models have emerged as the state-of-the-art for image generation, among other tasks.
no code implementations • 21 Oct 2022 • Christopher Agia, Toki Migimatsu, Jiajun Wu, Jeannette Bohg
We further demonstrate how STAP can be used for task and motion planning by estimating the geometric feasibility of skill sequences provided by a task planner.
no code implementations • 17 Oct 2022 • Simon Le Cleac'h, Hong-Xing Yu, Michelle Guo, Taylor A. Howell, Ruohan Gao, Jiajun Wu, Zachary Manchester, Mac Schwager
A robot can use this simulation to optimize grasps and manipulation trajectories of neural objects, or to improve the neural object models through gradient-based real-to-simulation transfer.
no code implementations • 13 Oct 2022 • Matt Deitke, Dhruv Batra, Yonatan Bisk, Tommaso Campari, Angel X. Chang, Devendra Singh Chaplot, Changan Chen, Claudia Pérez D'Arpino, Kiana Ehsani, Ali Farhadi, Li Fei-Fei, Anthony Francis, Chuang Gan, Kristen Grauman, David Hall, Winson Han, Unnat Jain, Aniruddha Kembhavi, Jacob Krantz, Stefan Lee, Chengshu Li, Sagnik Majumder, Oleksandr Maksymets, Roberto Martín-Martín, Roozbeh Mottaghi, Sonia Raychaudhuri, Mike Roberts, Silvio Savarese, Manolis Savva, Mohit Shridhar, Niko Sünderhauf, Andrew Szot, Ben Talbot, Joshua B. Tenenbaum, Jesse Thomason, Alexander Toshev, Joanne Truong, Luca Weihs, Jiajun Wu
We present a retrospective on the state of Embodied AI research.
no code implementations • 23 Aug 2022 • Fan-Yun Sun, Isaac Kauvar, Ruohan Zhang, Jiachen Li, Mykel Kochenderfer, Jiajun Wu, Nick Haber
Modeling multi-agent systems requires understanding how agents interact.
no code implementations • 25 Jul 2022 • Ruocheng Wang, Yunzhi Zhang, Jiayuan Mao, Chin-Yi Cheng, Jiajun Wu
We study the problem of translating an image-based, step-by-step assembly manual created by human designers into machine-interpretable instructions.
no code implementations • CVPR 2022 • Sumith Kulal, Jiayuan Mao, Alex Aiken, Jiajun Wu
We introduce Programmatic Motion Concepts, a hierarchical motion representation for human actions that captures both low-level motion and high-level description as motion concepts.
no code implementations • 23 Jun 2022 • Agrim Gupta, Stephen Tian, Yunzhi Zhang, Jiajun Wu, Roberto Martín-Martín, Li Fei-Fei
This work shows that we can create good video prediction models by pre-training transformers via masked visual modeling.
no code implementations • 13 Jun 2022 • Ziang Liu, Roberto Martín-Martín, Fei Xia, Jiajun Wu, Li Fei-Fei
Robots excel in performing repetitive and precision-sensitive tasks in controlled environments such as warehouses and factories, but have not been yet extended to embodied AI agents providing assistance in household tasks.
1 code implementation • CVPR 2022 • Shyamal Buch, Cristóbal Eyzaguirre, Adrien Gaidon, Jiajun Wu, Li Fei-Fei, Juan Carlos Niebles
Building on recent progress in self-supervised image-language models, we revisit this question in the context of video and language tasks.
Ranked #3 on
Video Question Answering
on NExT-QA
1 code implementation • 17 May 2022 • Honglin Chen, Rahul Venkatesh, Yoni Friedman, Jiajun Wu, Joshua B. Tenenbaum, Daniel L. K. Yamins, Daniel M. Bear
Self-supervised, category-agnostic segmentation of real-world images is a challenging open problem in computer vision.
no code implementations • 8 May 2022 • Cameron Smith, Hong-Xing Yu, Sergey Zakharov, Fredo Durand, Joshua B. Tenenbaum, Jiajun Wu, Vincent Sitzmann
Neural scene representations, both continuous and discrete, have recently emerged as a powerful new paradigm for 3D scene understanding.
no code implementations • 5 May 2022 • Haochen Shi, Huazhe Xu, Zhiao Huang, Yunzhu Li, Jiajun Wu
Our learned model-based planning framework is comparable to and sometimes better than human subjects on the tested tasks.
no code implementations • 4 May 2022 • Yunzhi Zhang, Jiajun Wu
Novel view synthesis (NVS) and video prediction (VP) are typically considered disjoint tasks in computer vision.
no code implementations • CVPR 2022 • Hong-Xing Yu, Jiajun Wu, Li Yi
To incorporate object-level rotation equivariance into 3D object detectors, we need a mechanism to extract equivariant features with local object-level spatial support while being able to model cross-object context information.
1 code implementation • CVPR 2022 • Ruohan Gao, Zilin Si, Yen-Yu Chang, Samuel Clarke, Jeannette Bohg, Li Fei-Fei, Wenzhen Yuan, Jiajun Wu
We present ObjectFolder 2. 0, a large-scale, multisensory dataset of common household objects in the form of implicit neural representations that significantly enhances ObjectFolder 1. 0 in three aspects.
no code implementations • ICLR 2022 • Kyle Hsu, Moo Jin Kim, Rafael Rafailov, Jiajun Wu, Chelsea Finn
We study how the choice of visual perspective affects learning and generalization in the context of physical manipulation from raw sensor observations.
no code implementations • NeurIPS 2021 • Jiayuan Mao, Haoyue Shi, Jiajun Wu, Roger P. Levy, Joshua B. Tenenbaum
We present Grammar-Based Grounded Lexicon Learning (G2L2), a lexicalist approach toward learning a compositional and grounded meaning representation of language from grounded data, such as paired images and texts.
no code implementations • 29 Sep 2021 • Michelle Guo, Wenhao Yu, Daniel Ho, Jiajun Wu, Yunfei Bai, Karen Liu, Wenlong Lu
In addition, we perform two studies showing that UC-DiffOSI operates well in environments with changing or unknown dynamics.
no code implementations • 29 Sep 2021 • Zhezheng Luo, Jiayuan Mao, Jiajun Wu, Tomas Perez, Joshua B. Tenenbaum, Leslie Pack Kaelbling
We present a framework for learning compositional, rational skill models (RatSkills) that support efficient planning and inverse planning for achieving novel goals and recognizing activities.
no code implementations • 29 Sep 2021 • Guangxuan Xiao, Leslie Pack Kaelbling, Jiajun Wu, Jiayuan Mao
To leverage the sparsity in hypergraph neural networks, SpaLoc represents the grounding of relationships such as parent and grandparent as sparse tensors and uses neural networks and finite-domain quantification operations to infer new facts based on the input.
no code implementations • 16 Sep 2021 • Ruohan Gao, Yen-Yu Chang, Shivani Mall, Li Fei-Fei, Jiajun Wu
Multisensory object-centric perception, reasoning, and interaction have been a key research topic in recent years.
1 code implementation • 16 Aug 2021 • Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel, Noah Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Koh, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, aditi raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, Percy Liang
AI is undergoing a paradigm shift with the rise of models (e. g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks.
1 code implementation • 6 Aug 2021 • Chengshu Li, Fei Xia, Roberto Martín-Martín, Michael Lingelbach, Sanjana Srivastava, Bokui Shen, Kent Vainio, Cem Gokmen, Gokul Dharan, Tanish Jain, Andrey Kurenkov, C. Karen Liu, Hyowon Gweon, Jiajun Wu, Li Fei-Fei, Silvio Savarese
We evaluate the new capabilities of iGibson 2. 0 to enable robot learning of novel tasks, in the hope of demonstrating the potential of this new simulator to support new research in embodied AI.
no code implementations • 6 Aug 2021 • Sanjana Srivastava, Chengshu Li, Michael Lingelbach, Roberto Martín-Martín, Fei Xia, Kent Vainio, Zheng Lian, Cem Gokmen, Shyamal Buch, C. Karen Liu, Silvio Savarese, Hyowon Gweon, Jiajun Wu, Li Fei-Fei
We introduce BEHAVIOR, a benchmark for embodied AI with 100 activities in simulation, spanning a range of everyday household chores such as cleaning, maintenance, and food preparation.
1 code implementation • ICLR 2022 • Chenlin Meng, Yutong He, Yang song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, Stefano Ermon
The key challenge is balancing faithfulness to the user input (e. g., hand-drawn colored strokes) and realism of the synthesized image.
1 code implementation • ICLR 2022 • Hong-Xing Yu, Leonidas J. Guibas, Jiajun Wu
We study the problem of inferring an object-centric scene representation from a single image, aiming to derive a representation that explains the image formation process, captures the scene's 3D nature, and is learned without supervision.
no code implementations • 10 Jun 2021 • Jiayuan Mao, Zhezheng Luo, Chuang Gan, Joshua B. Tenenbaum, Jiajun Wu, Leslie Pack Kaelbling, Tomer D. Ullman
We present Temporal and Object Quantification Networks (TOQ-Nets), a new class of neuro-symbolic networks with a structural bias that enables them to learn to recognize complex relational-temporal events.
no code implementations • CVPR 2021 • Tomas Jakab, Richard Tucker, Ameesh Makadia, Jiajun Wu, Noah Snavely, Angjoo Kanazawa
We cast this as the problem of aligning a source 3D object to a target 3D object from the same object category.
no code implementations • CVPR 2021 • Sumith Kulal, Jiayuan Mao, Alex Aiken, Jiajun Wu
We posit that adding higher-level motion primitives, which can capture natural coarser units of motion such as backswing or follow-through, can be used to improve downstream analysis tasks.
1 code implementation • CVPR 2021 • Shangzhe Wu, Ameesh Makadia, Jiajun Wu, Noah Snavely, Richard Tucker, Angjoo Kanazawa
Recent works have shown exciting results in unsupervised image de-rendering -- learning to decompose 3D shape, appearance, and lighting from single-image collections without explicit supervision.
1 code implementation • ICCV 2021 • Linqi Zhou, Yilun Du, Jiajun Wu
We propose a novel approach for probabilistic generative modeling of 3D shapes.
no code implementations • 30 Mar 2021 • Zhenfang Chen, Jiayuan Mao, Jiajun Wu, Kwan-Yee Kenneth Wong, Joshua B. Tenenbaum, Chuang Gan
We study the problem of dynamic visual reasoning on raw videos.
no code implementations • CVPR 2021 • Yifan Wang, Andrew Liu, Richard Tucker, Jiajun Wu, Brian L. Curless, Steven M. Seitz, Noah Snavely
We present a framework for automatically reconfiguring images of street scenes by populating, depopulating, or repopulating them with objects such as pedestrians or vehicles.
no code implementations • ICCV 2021 • Dave Epstein, Jiajun Wu, Cordelia Schmid, Chen Sun
Learning to model how the world changes as time elapses has proven a challenging problem for the computer vision community.
no code implementations • ICLR 2021 • Zhenfang Chen, Jiayuan Mao, Jiajun Wu, Kwan-Yee Kenneth Wong, Joshua B. Tenenbaum, Chuang Gan
We study the problem of dynamic visual reasoning on raw videos.
no code implementations • ICLR 2021 • Yilun Du, Kevin A. Smith, Tomer Ullman, Joshua B. Tenenbaum, Jiajun Wu
We study the problem of unsupervised physical object discovery.
no code implementations • 1 Jan 2021 • Jiayuan Mao, Zhezheng Luo, Chuang Gan, Joshua B. Tenenbaum, Jiajun Wu, Leslie Pack Kaelbling, Tomer Ullman
We aim to learn generalizable representations for complex activities by quantifying over both entities and time, as in “the kicker is behind all the other players,” or “the player controls the ball until it moves toward the goal.” Such a structural inductive bias of object relations, object quantification, and temporal orders will enable the learned representation to generalize to situations with varying numbers of agents, objects, and time courses.
no code implementations • Findings (ACL) 2021 • Ruocheng Wang, Jiayuan Mao, Samuel J. Gershman, Jiajun Wu
These object-centric concepts derived from language facilitate the learning of object-centric representations.
2 code implementations • 23 Dec 2020 • Zelin Zhao, Chuang Gan, Jiajun Wu, Xiaoxiao Guo, Joshua B. Tenenbaum
Humans can abstract prior knowledge from very little data and use it to boost skill learning.
no code implementations • 21 Dec 2020 • Jianwei Yang, Jiayuan Mao, Jiajun Wu, Devi Parikh, David D. Cox, Joshua B. Tenenbaum, Chuang Gan
In contrast, symbolic and modular models have a relatively better grounding and robustness, though at the cost of accuracy.
1 code implementation • ICCV 2021 • Yilun Du, Yinan Zhang, Hong-Xing Yu, Joshua B. Tenenbaum, Jiajun Wu
We present a method, Neural Radiance Flow (NeRFlow), to learn a 4D spatial-temporal representation of a dynamic scene from a set of RGB images.
no code implementations • 15 Dec 2020 • Michelle Guo, Alireza Fathi, Jiajun Wu, Thomas Funkhouser
We present a method for composing photorealistic scenes from captured images of objects.
3 code implementations • CVPR 2021 • Eric R. Chan, Marco Monteiro, Petr Kellnhofer, Jiajun Wu, Gordon Wetzstein
We have witnessed rapid progress on 3D-aware image synthesis, leveraging recent advances in generative visual models and neural rendering.
Ranked #3 on
Scene Generation
on VizDoom
no code implementations • NeurIPS 2020 • Yikai Li, Jiayuan Mao, Xiuming Zhang, William T. Freeman, Joshua B. Tenenbaum, Noah Snavely, Jiajun Wu
We consider two important aspects in understanding and editing images: modeling regular, program-like texture or patterns in 2D planes, and 3D posing of these planes in the scene.
2 code implementations • 3 Nov 2020 • Zhenjia Xu, Zhanpeng He, Jiajun Wu, Shuran Song
3D scene representation for robot manipulation should capture three key object properties: permanency -- objects that become occluded over time continue to exist; amodal completeness -- objects have 3D occupancy, even if only partial observations are available; spatiotemporal continuity -- the movement of each object is continuous over space and time.
no code implementations • 24 Sep 2020 • Yue Wang, Alireza Fathi, Jiajun Wu, Thomas Funkhouser, Justin Solomon
A common dilemma in 3D object detection for autonomous driving is that high-quality, dense point clouds are only available during training, but not testing.
no code implementations • 24 Jul 2020 • Yilun Du, Kevin Smith, Tomer Ulman, Joshua Tenenbaum, Jiajun Wu
We study the problem of unsupervised physical object discovery.
1 code implementation • CVPR 2020 • Andrew Luo, Zhoutong Zhang, Jiajun Wu, Joshua B. Tenenbaum
Experiments suggest that our model achieves higher accuracy and diversity in conditional scene synthesis and allows exemplar-based scene generation from various input forms.
no code implementations • CVPR 2020 • Yikai Li, Jiayuan Mao, Xiuming Zhang, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu
We study the inverse graphics problem of inferring a holistic representation for natural images.
1 code implementation • NeurIPS 2020 • Daniel M. Bear, Chaofei Fan, Damian Mrowca, Yunzhu Li, Seth Alter, Aran Nayebi, Jeremy Schwartz, Li Fei-Fei, Jiajun Wu, Joshua B. Tenenbaum, Daniel L. K. Yamins
To overcome these limitations, we introduce the idea of Physical Scene Graphs (PSGs), which represent scenes as hierarchical graphs, with nodes in the hierarchy corresponding intuitively to object parts at different scales, and edges to physical connections between parts.
no code implementations • 10 Jun 2020 • Simon S. Du, Wei Hu, Zhiyuan Li, Ruoqi Shen, Zhao Song, Jiajun Wu
Though errors in past actions may affect the future, we are able to bound the number of particles needed so that the long-run reward of the policy based on particle filtering is close to that based on exact inference.
no code implementations • 18 May 2020 • Chengjian Sun, Jiajun Wu, Chenyang Yang
The samples required to train a DNN after ranking can be reduced by $15 \sim 2, 400$ folds to achieve the same system performance as the counterpart without using prior.
no code implementations • ICLR 2020 • Zhoutong Zhang, Yunyun Wang, Chuang Gan, Jiajun Wu, Joshua B. Tenenbaum, Antonio Torralba, William T. Freeman
We show that networks using Harmonic Convolution can reliably model audio priors and achieve high performance in unsupervised audio restoration tasks.
1 code implementation • ICML 2020 • Yunzhu Li, Toru Lin, Kexin Yi, Daniel M. Bear, Daniel L. K. Yamins, Jiajun Wu, Joshua B. Tenenbaum, Antonio Torralba
The abilities to perform physical reasoning and to adapt to new environments, while intrinsic to humans, remain challenging to state-of-the-art computational models.
1 code implementation • NeurIPS 2019 • Chi Han, Jiayuan Mao, Chuang Gan, Joshua B. Tenenbaum, Jiajun Wu
Humans reason with concepts and metaconcepts: we recognize red and green from visual input; we also understand that they describe the same property of objects (i. e., the color).
1 code implementation • 25 Dec 2019 • Chuang Gan, Yiwei Zhang, Jiajun Wu, Boqing Gong, Joshua B. Tenenbaum
In this paper, we attempt to approach the problem of Audio-Visual Embodied Navigation, the task of planning the shortest path from a random starting location in a scene to the sound source in an indoor environment, given only raw egocentric visual and audio sensory data.
1 code implementation • NeurIPS 2019 • Kevin Smith, Lingjie Mei, Shunyu Yao, Jiajun Wu, Elizabeth Spelke, Josh Tenenbaum, Tomer Ullman
We also present a new test set for measuring violations of physical expectations, using a range of scenarios derived from developmental psychology.
no code implementations • 8 Nov 2019 • Alina Kloss, Maria Bauza, Jiajun Wu, Joshua B. Tenenbaum, Alberto Rodriguez, Jeannette Bohg
Planning contact interactions is one of the core challenges of many robotic tasks.
no code implementations • 29 Oct 2019 • Jiajun Wu, Chengjian Sun, Chenyang Yang
In this paper, we introduce a proactive optimization framework for anticipatory resource allocation, where the future information is implicitly predicted under the same objective with the policy optimization in a single step.
1 code implementation • 28 Oct 2019 • Rishi Veerapaneni, John D. Co-Reyes, Michael Chang, Michael Janner, Chelsea Finn, Jiajun Wu, Joshua B. Tenenbaum, Sergey Levine
This paper tests the hypothesis that modeling a scene in terms of entities and their local interactions, as opposed to modeling the scene globally, provides a significant benefit in generalizing to physical tasks in a combinatorial space the learner has not encountered before.
no code implementations • ICLR 2020 • Yunzhu Li, Hao He, Jiajun Wu, Dina Katabi, Antonio Torralba
Finding an embedding space for a linear approximation of a nonlinear dynamical system enables efficient system identification and control synthesis.
3 code implementations • ICLR 2020 • Kexin Yi, Chuang Gan, Yunzhu Li, Pushmeet Kohli, Jiajun Wu, Antonio Torralba, Joshua B. Tenenbaum
While these models thrive on the perception-based task (descriptive), they perform poorly on the causal tasks (explanatory, predictive and counterfactual), suggesting that a principled approach for causal reasoning should incorporate the capability of both perceiving complex visual and language inputs, and understanding the underlying dynamics and causal relations.
1 code implementation • 28 Sep 2019 • Yunbo Wang, Bo Liu, Jiajun Wu, Yuke Zhu, Simon S. Du, Li Fei-Fei, Joshua B. Tenenbaum
A major difficulty of solving continuous POMDPs is to infer the multi-modal distribution of the unobserved true states and to make the planning algorithm dependent on the perceived uncertainty.
no code implementations • ICCV 2019 • Jiayuan Mao, Xiuming Zhang, Yikai Li, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu
Humans are capable of building holistic representations for images at various levels, from local objects, to pairwise relations, to global structures.
no code implementations • 17 Jun 2019 • Sidi Lu, Jiayuan Mao, Joshua B. Tenenbaum, Jiajun Wu
In this paper, we propose a hybrid inference algorithm, the Neurally-Guided Structure Inference (NG-SI), keeping the advantages of both search-based and data-driven methods.
no code implementations • 10 Jun 2019 • Zhenjia Xu, Jiajun Wu, Andy Zeng, Joshua B. Tenenbaum, Shuran Song
We study the problem of learning physical object representations for robot manipulation.
no code implementations • ICLR 2019 • Chen Sun, Per Karlsson, Jiajun Wu, Joshua B. Tenenbaum, Kevin Murphy
We present a method which learns to integrate temporal information, from a learned dynamics model, with ambiguous visual information, from a learned vision model, in the context of interacting agents.
no code implementations • ICLR 2019 • Yunchao Liu, Zheng Wu, Daniel Ritchie, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu
We are able to understand the higher-level, abstract regularities within the scene such as symmetry and repetition.
no code implementations • ICLR 2019 • Zhenjia Xu, Zhijian Liu, Chen Sun, Kevin Murphy, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu
Humans easily recognize object parts and their hierarchical structure by watching how they move; they can then predict how each part moves in the future.
no code implementations • ICLR 2019 • Michael Janner, Sergey Levine, William T. Freeman, Joshua B. Tenenbaum, Chelsea Finn, Jiajun Wu
Object-based factorizations provide a useful level of abstraction for interacting with the world.
2 code implementations • ICLR 2019 • Jiayuan Mao, Chuang Gan, Pushmeet Kohli, Joshua B. Tenenbaum, Jiajun Wu
To bridge the learning of two modules, we use a neuro-symbolic reasoning module that executes these programs on the latent scene representation.
Ranked #5 on
Visual Question Answering
on CLEVR
no code implementations • 13 Apr 2019 • Anurag Ajay, Maria Bauza, Jiajun Wu, Nima Fazeli, Joshua B. Tenenbaum, Alberto Rodriguez, Leslie P. Kaelbling
Physics engines play an important role in robot planning and control; however, many real-world control problems involve complex contact dynamics that cannot be characterized analytically.
no code implementations • 12 Mar 2019 • Zhenjia Xu, Zhijian Liu, Chen Sun, Kevin Murphy, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu
Humans easily recognize object parts and their hierarchical structure by watching how they move; they can then predict how each part moves in the future.
no code implementations • 25 Feb 2019 • Chen Sun, Per Karlsson, Jiajun Wu, Joshua B. Tenenbaum, Kevin Murphy
We present a method that learns to integrate temporal information, from a learned dynamics model, with ambiguous visual information, from a learned vision model, in the context of interacting agents.
no code implementations • ICLR 2019 • Yonglong Tian, Andrew Luo, Xingyuan Sun, Kevin Ellis, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu
Human perception of 3D shapes goes beyond reconstructing them as a set of points or a composition of geometric primitives: we also effortlessly understand higher-level shape structure such as the repetition and reflective symmetry of object parts.
no code implementations • NeurIPS 2018 • Xiuming Zhang, Zhoutong Zhang, Chengkai Zhang, Joshua B. Tenenbaum, William T. Freeman, Jiajun Wu
From a single image, humans are able to perceive the full 3D shape of an object by exploiting learned shape priors from everyday life.
no code implementations • 28 Dec 2018 • Michael Janner, Sergey Levine, William T. Freeman, Joshua B. Tenenbaum, Chelsea Finn, Jiajun Wu
Object-based factorizations provide a useful level of abstraction for interacting with the world.
1 code implementation • NeurIPS 2018 • Jun-Yan Zhu, Zhoutong Zhang, Chengkai Zhang, Jiajun Wu, Antonio Torralba, Joshua B. Tenenbaum, William T. Freeman
Our model first learns to synthesize 3D shapes that are indistinguishable from real shapes.
1 code implementation • NeurIPS 2018 • Jun-Yan Zhu, Zhoutong Zhang, Chengkai Zhang, Jiajun Wu, Antonio Torralba, Josh Tenenbaum, Bill Freeman
The VON not only generates images that are more realistic than the state-of-the-art 2D image synthesis methods but also enables many 3D operations such as changing the viewpoint of a generated image, shape and texture editing, linear interpolation in texture and shape space, and transferring appearance across different objects and viewpoints.
no code implementations • NeurIPS 2018 • Yilun Du, Zhijian Liu, Hector Basevi, Ales Leonardis, Bill Freeman, Josh Tenenbaum, Jiajun Wu
We first show that applying physics supervision to an existing scene understanding model increases performance, produces more stable predictions, and allows training to an equivalent performance level with fewer annotated training examples.
2 code implementations • NeurIPS 2018 • Kexin Yi, Jiajun Wu, Chuang Gan, Antonio Torralba, Pushmeet Kohli, Joshua B. Tenenbaum
Second, the model is more data- and memory-efficient: it performs well after learning on a small number of training data; it can also encode an image into a compact representation, requiring less storage than existing methods for offline question answering.
Ranked #1 on
Visual Question Answering
on CLEVR
no code implementations • ICLR 2019 • Yunzhu Li, Jiajun Wu, Russ Tedrake, Joshua B. Tenenbaum, Antonio Torralba
In this paper, we propose to learn a particle-based simulator for complex control tasks.
no code implementations • 2 Oct 2018 • Yuanming Hu, Jian-Cheng Liu, Andrew Spielberg, Joshua B. Tenenbaum, William T. Freeman, Jiajun Wu, Daniela Rus, Wojciech Matusik
The underlying physical laws of deformable objects are more complex, and the resulting systems have orders of magnitude more degrees of freedom and therefore they are significantly more computationally expensive to simulate.
1 code implementation • 28 Sep 2018 • Yunzhu Li, Jiajun Wu, Jun-Yan Zhu, Joshua B. Tenenbaum, Antonio Torralba, Russ Tedrake
There has been an increasing interest in learning dynamics simulators for model-based control.
no code implementations • 14 Sep 2018 • Xiuming Zhang, Tali Dekel, Tianfan Xue, Andrew Owens, Qiurui He, Jiajun Wu, Stefanie Mueller, William T. Freeman
We present a system that allows users to visualize complex human motion via 3D motion sculptures---a representation that conveys the 3D structure swept by a human body as it moves through space.
no code implementations • ECCV 2018 • Zhijian Liu, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu
As annotated data for object parts and physics are rare, we propose a novel formulation that learns physical primitives by explaining both an object's appearance and its behaviors in physical events.
no code implementations • ECCV 2018 • Jiajun Wu, Chengkai Zhang, Xiuming Zhang, Zhoutong Zhang, William T. Freeman, Joshua B. Tenenbaum
The problem of single-view 3D shape completion or reconstruction is challenging, because among the many possible shapes that explain an observation, most are implausible and do not correspond to natural objects.
no code implementations • ECCV 2018 • Tianfan Xue, Jiajun Wu, Zhoutong Zhang, Chengkai Zhang, Joshua B. Tenenbaum, William T. Freeman
Humans recognize object structure from both their appearance and motion; often, motion helps to resolve ambiguities in object structure that arise when we observe object appearance only.
1 code implementation • NeurIPS 2018 • Shunyu Yao, Tzu Ming Harry Hsu, Jun-Yan Zhu, Jiajun Wu, Antonio Torralba, William T. Freeman, Joshua B. Tenenbaum
In this work, we propose 3D scene de-rendering networks (3D-SDN) to address the above issues by integrating disentangled representations for semantics, geometry, and appearance into a deep generative model.
no code implementations • 9 Aug 2018 • Shaoxiong Wang, Jiajun Wu, Xingyuan Sun, Wenzhen Yuan, William T. Freeman, Joshua B. Tenenbaum, Edward H. Adelson
Perceiving accurate 3D object shape is important for robots to interact with the physical world.
no code implementations • 9 Aug 2018 • Anurag Ajay, Jiajun Wu, Nima Fazeli, Maria Bauza, Leslie P. Kaelbling, Joshua B. Tenenbaum, Alberto Rodriguez
An efficient, generalizable physical simulator with universal uncertainty estimates has wide applications in robot state estimation, planning, and control.
no code implementations • 24 Jul 2018 • David Zheng, Vinson Luo, Jiajun Wu, Joshua B. Tenenbaum
We propose a framework for the completely unsupervised learning of latent object properties from their interactions: the perception-prediction network (PPN).
no code implementations • 24 Jul 2018 • Tianfan Xue, Jiajun Wu, Katherine L. Bouman, William T. Freeman
We study the problem of synthesizing a number of likely future frames from a single input image.
1 code implementation • CVPR 2018 • Xingyuan Sun, Jiajun Wu, Xiuming Zhang, Zhoutong Zhang, Chengkai Zhang, Tianfan Xue, Joshua B. Tenenbaum, William T. Freeman
We study 3D shape modeling from a single image and make contributions to it in three aspects.
Ranked #1 on
3D Shape Classification
on Pix3D
no code implementations • 3 Apr 2018 • Jiajun Wu, Tianfan Xue, Joseph J. Lim, Yuandong Tian, Joshua B. Tenenbaum, Antonio Torralba, William T. Freeman
3D-INN is trained on real images to estimate 2D keypoint heatmaps from an input image; it then predicts 3D object structure from heatmaps using knowledge learned from synthetic 3D shapes.
no code implementations • 20 Dec 2017 • Andrew Owens, Jiajun Wu, Josh H. McDermott, William T. Freeman, Antonio Torralba
The sound of crashing waves, the roar of fast-moving cars -- sound conveys important information about the objects in our surroundings.
no code implementations • NeurIPS 2017 • Jiajun Wu, Erika Lu, Pushmeet Kohli, Bill Freeman, Josh Tenenbaum
At the core of our system is a physical world representation that is first recovered by a perception module and then utilized by physics and graphics engines.
no code implementations • NeurIPS 2017 • Zhoutong Zhang, Qiujia Li, Zhengjia Huang, Jiajun Wu, Josh Tenenbaum, Bill Freeman
Hearing an object falling onto the ground, humans can recover rich information including its rough shape, material, and falling height.
3 code implementations • CVPR 2018 • Xiang Long, Chuang Gan, Gerard de Melo, Jiajun Wu, Xiao Liu, Shilei Wen
In this paper, however, we show that temporal information, especially longer-term patterns, may not be necessary to achieve competitive results on common video classification datasets.
4 code implementations • 24 Nov 2017 • Tianfan Xue, Baian Chen, Jiajun Wu, Donglai Wei, William T. Freeman
Many video enhancement algorithms rely on optical flow to register frames in a video sequence.
Ranked #7 on
Video Frame Interpolation
on Middlebury
no code implementations • NeurIPS 2017 • Michael Janner, Jiajun Wu, Tejas D. Kulkarni, Ilker Yildirim, Joshua B. Tenenbaum
Intrinsic decomposition from a single image is a highly challenging task, due to its inherent ambiguity and the scarcity of training data.
no code implementations • NeurIPS 2017 • Jiajun Wu, Yifan Wang, Tianfan Xue, Xingyuan Sun, William T. Freeman, Joshua B. Tenenbaum
First, compared to full 3D shape, 2. 5D sketches are much easier to be recovered from a 2D image; models that recover 2. 5D sketches are also more likely to transfer from synthetic to real data.
Ranked #2 on
3D Shape Classification
on Pix3D
3D Object Reconstruction From A Single Image
3D Reconstruction
+2
1 code implementation • ICCV 2017 • Chen Liu, Jiajun Wu, Pushmeet Kohli, Yasutaka Furukawa
A neural architecture first transforms a rasterized image to a set of junctions that represent low-level geometric and semantic information (e. g., wall corners or door end-points).
no code implementations • ICCV 2017 • Zhoutong Zhang, Jiajun Wu, Qiujia Li, Zhengjia Huang, James Traer, Josh H. McDermott, Joshua B. Tenenbaum, William T. Freeman
Humans infer rich knowledge of objects from both auditory and visual cues.
no code implementations • CVPR 2017 • Amir Arsalan Soltani, Haibin Huang, Jiajun Wu, Tejas D. Kulkarni, Joshua B. Tenenbaum
We take an alternative approach: learning a generative model over multi-view depth maps or their corresponding silhouettes, and using a deterministic rendering function to produce 3D shapes from these images.
no code implementations • CVPR 2017 • Jiajun Wu, Joshua B. Tenenbaum, Pushmeet Kohli
Our approach employs a deterministic rendering function as the decoder, mapping a naturally structured and disentangled scene description, which we named scene XML, to an image.
no code implementations • 5 Dec 2016 • Chen Liu, Jiajun Wu, Pushmeet Kohli, Yasutaka Furukawa
Our result implies that neural networks are effective at perceptual tasks that require long periods of reasoning even for humans to solve.
3 code implementations • NeurIPS 2016 • Jiajun Wu, Chengkai Zhang, Tianfan Xue, William T. Freeman, Joshua B. Tenenbaum
We study the problem of 3D object generation.
Ranked #3 on
3D Shape Classification
on Pix3D
3D Object Recognition
3D Point Cloud Linear Classification
+1
1 code implementation • 25 Aug 2016 • Andrew Owens, Jiajun Wu, Josh H. McDermott, William T. Freeman, Antonio Torralba
We show that, through this process, the network learns a representation that conveys information about objects and scenes.
3 code implementations • NeurIPS 2016 • Tianfan Xue, Jiajun Wu, Katherine L. Bouman, William T. Freeman
We study the problem of synthesizing a number of likely future frames from a single input image.
no code implementations • 4 May 2016 • Renqiao Zhang, Jiajun Wu, Chengkai Zhang, William T. Freeman, Joshua B. Tenenbaum
Humans demonstrate remarkable abilities to predict physical events in complex scenes.
1 code implementation • 29 Apr 2016 • Jiajun Wu, Tianfan Xue, Joseph J. Lim, Yuandong Tian, Joshua B. Tenenbaum, Antonio Torralba, William T. Freeman
In this work, we propose 3D INterpreter Network (3D-INN), an end-to-end framework which sequentially estimates 2D keypoint heatmaps and 3D object structure, trained on both real 2D-annotated images and synthetic 3D data.
no code implementations • NeurIPS 2015 • Jiajun Wu, Ilker Yildirim, Joseph J. Lim, Bill Freeman, Josh Tenenbaum
Humans demonstrate remarkable abilities to predict physical events in dynamic scenes, and to infer the physical properties of objects from static images.
no code implementations • CVPR 2015 • Jiajun Wu, Yinan Yu, Chang Huang, Kai Yu
The recent development in learning deep representations has demonstrated its wide applications in traditional vision tasks like classification and detection.
no code implementations • CVPR 2014 • Jiajun Wu, Yibiao Zhao, Jun-Yan Zhu, Siwei Luo, Zhuowen Tu
Interactive segmentation, in which a user provides a bounding box to an object of interest for image segmentation, has been applied to a variety of applications in image editing, crowdsourcing, computer vision, and medical imaging.
no code implementations • CVPR 2013 • Quannan Li, Jiajun Wu, Zhuowen Tu
Obtaining effective mid-level representations has become an increasingly important task in computer vision.