no code implementations • 15 Sep 2023 • Anurag Ajay, Seungwook Han, Yilun Du, Shuang Li, Abhi Gupta, Tommi Jaakkola, Josh Tenenbaum, Leslie Kaelbling, Akash Srivastava, Pulkit Agrawal
We use a large language model to construct symbolic plans that are grounded in the environment through a large video diffusion model.
no code implementations • 2 Sep 2023 • Zhutian Yang, Jiayuan Mao, Yilun Du, Jiajun Wu, Joshua B. Tenenbaum, Tomás Lozano-Pérez, Leslie Pack Kaelbling
This paper introduces an approach for learning to solve continuous constraint satisfaction problems (CCSP) in robotic reasoning and planning.
no code implementations • 24 Jul 2023 • Yining Hong, Haoyu Zhen, Peihao Chen, Shuhong Zheng, Yilun Du, Zhenfang Chen, Chuang Gan
Furthermore, experiments on our held-in datasets for 3D captioning, task composition, and 3D-assisted dialogue show that our model outperforms 2D VLMs.
no code implementations • 5 Jul 2023 • Hongxin Zhang, Weihua Du, Jiaming Shan, Qinhong Zhou, Yilun Du, Joshua B. Tenenbaum, Tianmin Shu, Chuang Gan
Large Language Models (LLMs) have demonstrated impressive planning abilities in single-agent embodied tasks across various domains.
no code implementations • 8 Jun 2023 • Nan Liu, Yilun Du, Shuang Li, Joshua B. Tenenbaum, Antonio Torralba
Text-to-image generative models have enabled high-resolution image synthesis across different domains, but require users to specify the content they wish to generate.
no code implementations • 2 Jun 2023 • Mengjiao Yang, Yilun Du, Bo Dai, Dale Schuurmans, Joshua B. Tenenbaum, Pieter Abbeel
Large text-to-video models trained on internet-scale data have demonstrated exceptional capabilities in generating high-fidelity videos from arbitrary textual descriptions.
no code implementations • 31 May 2023 • Cameron Smith, Yilun Du, Ayush Tewari, Vincent Sitzmann
SE(3) camera pose estimation is then performed via a weighted least-squares fit to the scene flow field.
1 code implementation • 23 May 2023 • Yilun Du, Shuang Li, Antonio Torralba, Joshua B. Tenenbaum, Igor Mordatch
Our findings indicate that this approach significantly enhances mathematical and strategic reasoning across a number of tasks.
no code implementations • 22 May 2023 • Kevin Black, Michael Janner, Yilun Du, Ilya Kostrikov, Sergey Levine
Diffusion models are a class of flexible generative models trained with an approximation to the log-likelihood objective.
1 code implementation • CVPR 2023 • Yilun Du, Cameron Smith, Ayush Tewari, Vincent Sitzmann
We conduct extensive comparisons on held-out test scenes across two real-world datasets, significantly outperforming prior work on novel view synthesis from sparse image observations and achieving multi-view-consistent novel view synthesis.
no code implementations • 28 Mar 2023 • Hongyi Chen, Yilun Du, Yiye Chen, Joshua Tenenbaum, Patricio A. Vela
In this paper, we suggest an approach towards integrating planning with sequence models based on the idea of iterative energy minimization, and illustrate how such a procedure leads to improved RL performance across different tasks.
no code implementations • CVPR 2023 • Yining Hong, Chunru Lin, Yilun Du, Zhenfang Chen, Joshua B. Tenenbaum, Chuang Gan
We suggest that a principled approach for 3D reasoning from multi-view images should be to infer a compact 3D representation of the world from the multi-view images, which is further grounded on open-vocabulary semantic concepts, and then to execute reasoning on these 3D representations.
no code implementations • 13 Mar 2023 • Jiahui Fu, Yilun Du, Kurran Singh, Joshua B. Tenenbaum, John J. Leonard
We present NeuSE, a novel Neural SE(3)-Equivariant Embedding for objects, and illustrate how it supports object SLAM for consistent spatial understanding with long-term scene changes.
no code implementations • 7 Mar 2023 • Sherry Yang, Ofir Nachum, Yilun Du, Jason Wei, Pieter Abbeel, Dale Schuurmans
In response to these developments, new paradigms are emerging for training foundation models to interact with other agents and perform long-term reasoning.
2 code implementations • 22 Feb 2023 • Yilun Du, Conor Durkan, Robin Strudel, Joshua B. Tenenbaum, Sander Dieleman, Rob Fergus, Jascha Sohl-Dickstein, Arnaud Doucet, Will Grathwohl
In this work, we build upon these ideas using the score-based interpretation of diffusion models, and explore alternative ways to condition, modify, and reuse diffusion models for tasks involving compositional generation and guidance.
no code implementations • 7 Feb 2023 • Ethan Chun, Yilun Du, Anthony Simeonov, Tomas Lozano-Perez, Leslie Kaelbling
A robot operating in a household environment will see a wide range of unique and unfamiliar objects.
no code implementations • 31 Jan 2023 • Yilun Du, Mengjiao Yang, Bo Dai, Hanjun Dai, Ofir Nachum, Joshua B. Tenenbaum, Dale Schuurmans, Pieter Abbeel
The proposed policy-as-video formulation can further represent environments with different state and action spaces in a unified space of images, which, for example, enables learning and generalization across a variety of robot manipulation tasks.
no code implementations • 28 Nov 2022 • Anurag Ajay, Yilun Du, Abhi Gupta, Joshua Tenenbaum, Tommi Jaakkola, Pulkit Agrawal
We further demonstrate the advantages of modeling policies as conditional diffusion models by considering two other conditioning variables: constraints and skills.
1 code implementation • 17 Nov 2022 • Anthony Simeonov, Yilun Du, Lin Yen-Chen, Alberto Rodriguez, Leslie Pack Kaelbling, Tomas Lozano-Perez, Pulkit Agrawal
This formalism is implemented in three steps: assigning a consistent local coordinate frame to the task-relevant object parts, determining the location and orientation of this coordinate frame on unseen object instances, and executing an action that brings these frames into the desired alignment.
no code implementations • 8 Nov 2022 • Robin Strudel, Corentin Tallec, Florent Altché, Yilun Du, Yaroslav Ganin, Arthur Mensch, Will Grathwohl, Nikolay Savinov, Sander Dieleman, Laurent SIfre, Rémi Leblond
Can continuous diffusion models bring the same performance breakthrough on natural language they did for image generation?
no code implementations • 8 Nov 2022 • Weiyu Liu, Yilun Du, Tucker Hermans, Sonia Chernova, Chris Paxton
StructDiffusion even improves the success rate of assembling physically-valid structures out of unseen objects by on average 16% over an existing multi-modal transformer model trained on specific structures.
no code implementations • 20 Oct 2022 • Shuang Li, Yilun Du, Joshua B. Tenenbaum, Antonio Torralba, Igor Mordatch
Such closed-loop communication enables models to correct errors caused by other models, significantly boosting performance on downstream tasks, e. g. improving accuracy on grade school math problems by 7. 5%, without requiring any model finetuning.
Ranked #1 on
Video Question Answering
on ActivityNet-QA
(Vocabulary Size metric)
no code implementations • 1 Aug 2022 • Jiahui Fu, Yilun Du, Kurran Singh, Joshua B. Tenenbaum, John J. Leonard
The ability to reason about changes in the environment is crucial for robots operating over extended periods of time.
no code implementations • 22 Jul 2022 • Prafull Sharma, Ayush Tewari, Yilun Du, Sergey Zakharov, Rares Ambrus, Adrien Gaidon, William T. Freeman, Fredo Durand, Joshua B. Tenenbaum, Vincent Sitzmann
We present a method to map 2D image observations of a scene to a persistent 3D scene representation, enabling novel view synthesis and disentangled representation of the movable and immovable components of the scene.
no code implementations • 13 Jul 2022 • Yining Hong, Yilun Du, Chunru Lin, Joshua B. Tenenbaum, Chuang Gan
Experimental results show that our proposed framework outperforms unsupervised/language-mediated segmentation models on semantic and instance segmentation tasks, as well as outperforms existing models on the challenging 3D aware visual reasoning tasks.
1 code implementation • 30 Jun 2022 • Yilun Du, Shuang Li, Joshua B. Tenenbaum, Igor Mordatch
Finally, we illustrate that our approach can recursively solve algorithmic problems requiring nested reasoning
1 code implementation • 3 Jun 2022 • Nan Liu, Shuang Li, Yilun Du, Antonio Torralba, Joshua B. Tenenbaum
Large text-guided diffusion models, such as DALLE-2, are able to generate stunning photorealistic images given natural language descriptions.
2 code implementations • 20 May 2022 • Michael Janner, Yilun Du, Joshua B. Tenenbaum, Sergey Levine
Model-based reinforcement learning methods often use learning only for the purpose of estimating an approximate dynamics model, offloading the rest of the decision-making work to classical trajectory optimizers.
no code implementations • 2 May 2022 • Rylan Schaeffer, Gabrielle Kaili-May Liu, Yilun Du, Scott Linderman, Ila Rani Fiete
Learning from a continuous stream of non-stationary data in an unsupervised manner is arguably one of the most common and most challenging settings facing intelligent agents.
no code implementations • 4 Apr 2022 • Andrew Luo, Yilun Du, Michael J. Tarr, Joshua B. Tenenbaum, Antonio Torralba, Chuang Gan
By modeling acoustic propagation in a scene as a linear time-invariant system, NAFs learn to continuously map all emitter and listener location pairs to a neural impulse response function that can then be applied to arbitrary sounds.
1 code implementation • CVPR 2022 • Klaus Greff, Francois Belletti, Lucas Beyer, Carl Doersch, Yilun Du, Daniel Duckworth, David J. Fleet, Dan Gnanapragasam, Florian Golemo, Charles Herrmann, Thomas Kipf, Abhijit Kundu, Dmitry Lagun, Issam Laradji, Hsueh-Ti, Liu, Henning Meyer, Yishu Miao, Derek Nowrouzezahrai, Cengiz Oztireli, Etienne Pot, Noha Radwan, Daniel Rebain, Sara Sabour, Mehdi S. M. Sajjadi, Matan Sela, Vincent Sitzmann, Austin Stone, Deqing Sun, Suhani Vora, Ziyu Wang, Tianhao Wu, Kwang Moo Yi, Fangcheng Zhong, Andrea Tagliasacchi
Data is the driving force of machine learning, with the amount and quality of training data often being more important for the performance of a system than architecture and training details.
1 code implementation • 3 Feb 2022 • Shuang Li, Xavier Puig, Chris Paxton, Yilun Du, Clinton Wang, Linxi Fan, Tao Chen, De-An Huang, Ekin Akyürek, Anima Anandkumar, Jacob Andreas, Igor Mordatch, Antonio Torralba, Yuke Zhu
Together, these results suggest that language modeling induces representations that are useful for modeling not just language, but also goals and plans; these representations can aid learning and generalization even outside of language processing.
1 code implementation • 9 Dec 2021 • Anthony Simeonov, Yilun Du, Andrea Tagliasacchi, Joshua B. Tenenbaum, Alberto Rodriguez, Pulkit Agrawal, Vincent Sitzmann
Our performance generalizes across both object instances and 6-DoF object poses, and significantly outperforms a recent baseline that relies on 2D descriptors.
no code implementations • NeurIPS 2021 • Nan Liu, Shuang Li, Yilun Du, Joshua B. Tenenbaum, Antonio Torralba
The visual world around us can be described as a structured set of objects and their associated relations.
no code implementations • NeurIPS 2021 • Yilun Du, Katherine M. Collins, Joshua B. Tenenbaum, Vincent Sitzmann
We leverage neural fields to capture the underlying structure in image, shape, audio and cross-modal audiovisual domains in a modality-independent manner.
1 code implementation • NeurIPS 2021 • Yilun Du, Shuang Li, Yash Sharma, Joshua B. Tenenbaum, Igor Mordatch
In this work, we propose COMET, which discovers and represents concepts as separate energy functions, enabling us to represent both global concepts as well as objects under a unified framework.
no code implementations • 14 Oct 2021 • Joseph Suarez, Yilun Du, Clare Zhu, Igor Mordatch, Phillip Isola
Neural MMO is a computationally accessible research platform that combines large agent populations, long time horizons, open-ended tasks, and modular game systems.
1 code implementation • ICCV 2021 • Shuang Li, Yilun Du, Antonio Torralba, Josef Sivic, Bryan Russell
Our task poses unique challenges as a system does not know what types of human-object interactions are present in a video or the actual spatiotemporal location of the human and the object.
Human-Object Interaction Detection
Weakly-supervised Learning
no code implementations • 29 Sep 2021 • Shuang Li, Xavier Puig, Yilun Du, Ekin Akyürek, Antonio Torralba, Jacob Andreas, Igor Mordatch
Additional experiments explore the role of language-based encodings in these results; we find that it is possible to train a simple adapter layer that maps from observations and action histories to LM embeddings, and thus that language modeling provides an effective initializer even for tasks with no language as input or output.
1 code implementation • ICCV 2021 • Yilun Du, Chuang Gan, Phillip Isola
Instead, it must explore its environment to acquire the data it will learn from.
1 code implementation • ICCV 2021 • Linqi Zhou, Yilun Du, Jiajun Wu
We propose a novel approach for probabilistic generative modeling of 3D shapes.
no code implementations • ICLR 2021 • Yilun Du, Kevin A. Smith, Tomer Ullman, Joshua B. Tenenbaum, Jiajun Wu
We study the problem of unsupervised physical object discovery.
1 code implementation • ICCV 2021 • Yilun Du, Yinan Zhang, Hong-Xing Yu, Joshua B. Tenenbaum, Jiajun Wu
We present a method, Neural Radiance Flow (NeRFlow), to learn a 4D spatial-temporal representation of a dynamic scene from a set of RGB images.
4 code implementations • 2 Dec 2020 • Yilun Du, Shuang Li, Joshua Tenenbaum, Igor Mordatch
Contrastive divergence is a popular method of training energy-based models, but is known to have difficulties with training stability.
no code implementations • NeurIPS 2020 • Yilun Du, Shuang Li, Igor Mordatch
A vital aspect of human intelligence is the ability to compose increasingly complex concepts out of simpler ideas, enabling both rapid learning and adaptation of knowledge.
1 code implementation • 24 Nov 2020 • Shuang Li, Yilun Du, Gido M. van de Ven, Igor Mordatch
We motivate Energy-Based Models (EBMs) as a promising model class for continual learning problems.
no code implementations • 16 Nov 2020 • Anthony Simeonov, Yilun Du, Beomjoon Kim, Francois R. Hogan, Joshua Tenenbaum, Pulkit Agrawal, Alberto Rodriguez
We present a framework for solving long-horizon planning problems involving manipulation of rigid objects that operates directly from a point-cloud observation, i. e. without prior object models.
no code implementations • 6 Nov 2020 • Yilun Du, Tomas Lozano-Perez, Leslie Kaelbling
The robot may be called upon later to retrieve objects and will need a long-term object-based memory in order to know how to find them.
no code implementations • 28 Sep 2020 • Yilun Du, Joshua B. Tenenbaum, Tomas Perez, Leslie Pack Kaelbling
When an agent interacts with a complex environment, it receives a stream of percepts in which it may detect entities, such as objects or people.
no code implementations • 24 Jul 2020 • Yilun Du, Kevin Smith, Tomer Ulman, Joshua Tenenbaum, Jiajun Wu
We study the problem of unsupervised physical object discovery.
1 code implementation • ICLR 2020 • Yilun Du, Joshua Meier, Jerry Ma, Rob Fergus, Alexander Rives
We propose an energy-based model (EBM) of protein conformations that operates at atomic scale.
1 code implementation • 13 Apr 2020 • Yilun Du, Shuang Li, Igor Mordatch
A vital aspect of human intelligence is the ability to compose increasingly complex concepts out of simpler ideas, enabling both rapid learning and adaptation of knowledge.
no code implementations • 31 Jan 2020 • Joseph Suarez, Yilun Du, Igor Mordatch, Phillip Isola
We present Neural MMO, a massively multiagent game environment inspired by MMOs and discuss our progress on two more general challenges in multiagent systems engineering for AI research: distributed infrastructure and game IO.
no code implementations • ICLR 2020 • Xingyou Song, Yiding Jiang, Stephen Tu, Yilun Du, Behnam Neyshabur
A major component of overfitting in model-free reinforcement learning (RL) involves the case where the agent may mistakenly correlate reward with certain spurious features from the observations generated by the Markov Decision Process (MDP).
1 code implementation • NeurIPS 2019 • Yilun Du, Igor Mordatch
Energy based models (EBMs) are appealing due to their generality and simplicity in likelihood modeling, but have been traditionally difficult to train.
Ranked #3 on
Image Generation
on Stacked MNIST
no code implementations • 25 Sep 2019 • Yilun Du, Phillip Isola
Reinforcement learning (RL) has led to increasingly complex looking behavior in recent years.
no code implementations • 15 Sep 2019 • Yilun Du, Toru Lin, Igor Mordatch
We provide an online algorithm to train EBMs while interacting with the environment, and show that EBMs allow for significantly better online learning than corresponding feed-forward networks.
no code implementations • 2 Jun 2019 • Xingyou Song, Yilun Du, Jacob Jackson
Recent results in Reinforcement Learning (RL) have shown that agents with limited training environments are susceptible to a large amount of overfitting across many domains.
1 code implementation • 13 May 2019 • Yilun Du, Karthik Narasimhan
While model-based deep reinforcement learning (RL) holds great promise for sample efficiency and generalization, learning an accurate dynamics model is often challenging and requires substantial interaction with the environment.
no code implementations • ICLR 2019 • Joseph Suarez, Yilun Du, Phillip Isola, Igor Mordatch
We demonstrate how this platform can be used to study behavior and learning in large populations of neural agents.
3 code implementations • 20 Mar 2019 • Yilun Du, Igor Mordatch
Energy based models (EBMs) are appealing due to their generality and simplicity in likelihood modeling, but have been traditionally difficult to train.
1 code implementation • 2 Mar 2019 • Joseph Suarez, Yilun Du, Phillip Isola, Igor Mordatch
The emergence of complex life on Earth is often attributed to the arms race that ensued from a huge number of organisms all competing for finite resources.
no code implementations • NeurIPS 2018 • Yilun Du, Zhijian Liu, Hector Basevi, Ales Leonardis, Bill Freeman, Josh Tenenbaum, Jiajun Wu
We first show that applying physics supervision to an existing scene understanding model increases performance, produces more stable predictions, and allows training to an equivalent performance level with fewer annotated training examples.
no code implementations • 27 Sep 2018 • Yilun Du, Karthik Narasimhan
While model-based deep reinforcement learning (RL) holds great promise for sample efficiency and generalization, learning an accurate dynamics model is challenging and often requires substantial interactions with the environment.