no code implementations • 27 Sep 2018 • Yilun Du, Karthik Narasimhan
While model-based deep reinforcement learning (RL) holds great promise for sample efficiency and generalization, learning an accurate dynamics model is challenging and often requires substantial interactions with the environment.
no code implementations • NeurIPS 2018 • Yilun Du, Zhijian Liu, Hector Basevi, Ales Leonardis, Bill Freeman, Josh Tenenbaum, Jiajun Wu
We first show that applying physics supervision to an existing scene understanding model increases performance, produces more stable predictions, and allows training to an equivalent performance level with fewer annotated training examples.
1 code implementation • 2 Mar 2019 • Joseph Suarez, Yilun Du, Phillip Isola, Igor Mordatch
The emergence of complex life on Earth is often attributed to the arms race that ensued from a huge number of organisms all competing for finite resources.
3 code implementations • 20 Mar 2019 • Yilun Du, Igor Mordatch
Energy based models (EBMs) are appealing due to their generality and simplicity in likelihood modeling, but have been traditionally difficult to train.
no code implementations • ICLR 2019 • Joseph Suarez, Yilun Du, Phillip Isola, Igor Mordatch
We demonstrate how this platform can be used to study behavior and learning in large populations of neural agents.
1 code implementation • 13 May 2019 • Yilun Du, Karthik Narasimhan
While model-based deep reinforcement learning (RL) holds great promise for sample efficiency and generalization, learning an accurate dynamics model is often challenging and requires substantial interaction with the environment.
no code implementations • 2 Jun 2019 • Xingyou Song, Yilun Du, Jacob Jackson
Recent results in Reinforcement Learning (RL) have shown that agents with limited training environments are susceptible to a large amount of overfitting across many domains.
no code implementations • 15 Sep 2019 • Yilun Du, Toru Lin, Igor Mordatch
We provide an online algorithm to train EBMs while interacting with the environment, and show that EBMs allow for significantly better online learning than corresponding feed-forward networks.
no code implementations • 25 Sep 2019 • Yilun Du, Phillip Isola
Reinforcement learning (RL) has led to increasingly complex looking behavior in recent years.
1 code implementation • NeurIPS 2019 • Yilun Du, Igor Mordatch
Energy based models (EBMs) are appealing due to their generality and simplicity in likelihood modeling, but have been traditionally difficult to train.
Ranked #3 on Image Generation on Stacked MNIST
no code implementations • ICLR 2020 • Xingyou Song, Yiding Jiang, Stephen Tu, Yilun Du, Behnam Neyshabur
A major component of overfitting in model-free reinforcement learning (RL) involves the case where the agent may mistakenly correlate reward with certain spurious features from the observations generated by the Markov Decision Process (MDP).
no code implementations • 31 Jan 2020 • Joseph Suarez, Yilun Du, Igor Mordatch, Phillip Isola
We present Neural MMO, a massively multiagent game environment inspired by MMOs and discuss our progress on two more general challenges in multiagent systems engineering for AI research: distributed infrastructure and game IO.
1 code implementation • 13 Apr 2020 • Yilun Du, Shuang Li, Igor Mordatch
A vital aspect of human intelligence is the ability to compose increasingly complex concepts out of simpler ideas, enabling both rapid learning and adaptation of knowledge.
1 code implementation • ICLR 2020 • Yilun Du, Joshua Meier, Jerry Ma, Rob Fergus, Alexander Rives
We propose an energy-based model (EBM) of protein conformations that operates at atomic scale.
no code implementations • 24 Jul 2020 • Yilun Du, Kevin Smith, Tomer Ulman, Joshua Tenenbaum, Jiajun Wu
We study the problem of unsupervised physical object discovery.
no code implementations • 28 Sep 2020 • Yilun Du, Joshua B. Tenenbaum, Tomas Perez, Leslie Pack Kaelbling
When an agent interacts with a complex environment, it receives a stream of percepts in which it may detect entities, such as objects or people.
no code implementations • 6 Nov 2020 • Yilun Du, Tomas Lozano-Perez, Leslie Kaelbling
The robot may be called upon later to retrieve objects and will need a long-term object-based memory in order to know how to find them.
no code implementations • 16 Nov 2020 • Anthony Simeonov, Yilun Du, Beomjoon Kim, Francois R. Hogan, Joshua Tenenbaum, Pulkit Agrawal, Alberto Rodriguez
We present a framework for solving long-horizon planning problems involving manipulation of rigid objects that operates directly from a point-cloud observation, i. e. without prior object models.
1 code implementation • 24 Nov 2020 • Shuang Li, Yilun Du, Gido M. van de Ven, Igor Mordatch
We motivate Energy-Based Models (EBMs) as a promising model class for continual learning problems.
no code implementations • NeurIPS 2020 • Yilun Du, Shuang Li, Igor Mordatch
A vital aspect of human intelligence is the ability to compose increasingly complex concepts out of simpler ideas, enabling both rapid learning and adaptation of knowledge.
4 code implementations • 2 Dec 2020 • Yilun Du, Shuang Li, Joshua Tenenbaum, Igor Mordatch
Contrastive divergence is a popular method of training energy-based models, but is known to have difficulties with training stability.
1 code implementation • ICCV 2021 • Yilun Du, Yinan Zhang, Hong-Xing Yu, Joshua B. Tenenbaum, Jiajun Wu
We present a method, Neural Radiance Flow (NeRFlow), to learn a 4D spatial-temporal representation of a dynamic scene from a set of RGB images.
no code implementations • ICLR 2021 • Yilun Du, Kevin A. Smith, Tomer Ullman, Joshua B. Tenenbaum, Jiajun Wu
We study the problem of unsupervised physical object discovery.
1 code implementation • ICCV 2021 • Linqi Zhou, Yilun Du, Jiajun Wu
We propose a novel approach for probabilistic generative modeling of 3D shapes.
1 code implementation • ICCV 2021 • Yilun Du, Chuang Gan, Phillip Isola
Instead, it must explore its environment to acquire the data it will learn from.
no code implementations • 29 Sep 2021 • Shuang Li, Xavier Puig, Yilun Du, Ekin Akyürek, Antonio Torralba, Jacob Andreas, Igor Mordatch
Additional experiments explore the role of language-based encodings in these results; we find that it is possible to train a simple adapter layer that maps from observations and action histories to LM embeddings, and thus that language modeling provides an effective initializer even for tasks with no language as input or output.
1 code implementation • ICCV 2021 • Shuang Li, Yilun Du, Antonio Torralba, Josef Sivic, Bryan Russell
Our task poses unique challenges as a system does not know what types of human-object interactions are present in a video or the actual spatiotemporal location of the human and the object.
no code implementations • 14 Oct 2021 • Joseph Suarez, Yilun Du, Clare Zhu, Igor Mordatch, Phillip Isola
Neural MMO is a computationally accessible research platform that combines large agent populations, long time horizons, open-ended tasks, and modular game systems.
1 code implementation • NeurIPS 2021 • Yilun Du, Shuang Li, Yash Sharma, Joshua B. Tenenbaum, Igor Mordatch
In this work, we propose COMET, which discovers and represents concepts as separate energy functions, enabling us to represent both global concepts as well as objects under a unified framework.
no code implementations • NeurIPS 2021 • Yilun Du, Katherine M. Collins, Joshua B. Tenenbaum, Vincent Sitzmann
We leverage neural fields to capture the underlying structure in image, shape, audio and cross-modal audiovisual domains in a modality-independent manner.
no code implementations • NeurIPS 2021 • Nan Liu, Shuang Li, Yilun Du, Joshua B. Tenenbaum, Antonio Torralba
The visual world around us can be described as a structured set of objects and their associated relations.
1 code implementation • 9 Dec 2021 • Anthony Simeonov, Yilun Du, Andrea Tagliasacchi, Joshua B. Tenenbaum, Alberto Rodriguez, Pulkit Agrawal, Vincent Sitzmann
Our performance generalizes across both object instances and 6-DoF object poses, and significantly outperforms a recent baseline that relies on 2D descriptors.
1 code implementation • 3 Feb 2022 • Shuang Li, Xavier Puig, Chris Paxton, Yilun Du, Clinton Wang, Linxi Fan, Tao Chen, De-An Huang, Ekin Akyürek, Anima Anandkumar, Jacob Andreas, Igor Mordatch, Antonio Torralba, Yuke Zhu
Together, these results suggest that language modeling induces representations that are useful for modeling not just language, but also goals and plans; these representations can aid learning and generalization even outside of language processing.
1 code implementation • CVPR 2022 • Klaus Greff, Francois Belletti, Lucas Beyer, Carl Doersch, Yilun Du, Daniel Duckworth, David J. Fleet, Dan Gnanapragasam, Florian Golemo, Charles Herrmann, Thomas Kipf, Abhijit Kundu, Dmitry Lagun, Issam Laradji, Hsueh-Ti, Liu, Henning Meyer, Yishu Miao, Derek Nowrouzezahrai, Cengiz Oztireli, Etienne Pot, Noha Radwan, Daniel Rebain, Sara Sabour, Mehdi S. M. Sajjadi, Matan Sela, Vincent Sitzmann, Austin Stone, Deqing Sun, Suhani Vora, Ziyu Wang, Tianhao Wu, Kwang Moo Yi, Fangcheng Zhong, Andrea Tagliasacchi
Data is the driving force of machine learning, with the amount and quality of training data often being more important for the performance of a system than architecture and training details.
1 code implementation • 4 Apr 2022 • Andrew Luo, Yilun Du, Michael J. Tarr, Joshua B. Tenenbaum, Antonio Torralba, Chuang Gan
By modeling acoustic propagation in a scene as a linear time-invariant system, NAFs learn to continuously map all emitter and listener location pairs to a neural impulse response function that can then be applied to arbitrary sounds.
no code implementations • 2 May 2022 • Rylan Schaeffer, Gabrielle Kaili-May Liu, Yilun Du, Scott Linderman, Ila Rani Fiete
Learning from a continuous stream of non-stationary data in an unsupervised manner is arguably one of the most common and most challenging settings facing intelligent agents.
2 code implementations • 20 May 2022 • Michael Janner, Yilun Du, Joshua B. Tenenbaum, Sergey Levine
Model-based reinforcement learning methods often use learning only for the purpose of estimating an approximate dynamics model, offloading the rest of the decision-making work to classical trajectory optimizers.
1 code implementation • 3 Jun 2022 • Nan Liu, Shuang Li, Yilun Du, Antonio Torralba, Joshua B. Tenenbaum
Large text-guided diffusion models, such as DALLE-2, are able to generate stunning photorealistic images given natural language descriptions.
1 code implementation • 30 Jun 2022 • Yilun Du, Shuang Li, Joshua B. Tenenbaum, Igor Mordatch
Finally, we illustrate that our approach can recursively solve algorithmic problems requiring nested reasoning
no code implementations • 13 Jul 2022 • Yining Hong, Yilun Du, Chunru Lin, Joshua B. Tenenbaum, Chuang Gan
Experimental results show that our proposed framework outperforms unsupervised/language-mediated segmentation models on semantic and instance segmentation tasks, as well as outperforms existing models on the challenging 3D aware visual reasoning tasks.
no code implementations • 22 Jul 2022 • Prafull Sharma, Ayush Tewari, Yilun Du, Sergey Zakharov, Rares Ambrus, Adrien Gaidon, William T. Freeman, Fredo Durand, Joshua B. Tenenbaum, Vincent Sitzmann
We present a method to map 2D image observations of a scene to a persistent 3D scene representation, enabling novel view synthesis and disentangled representation of the movable and immovable components of the scene.
no code implementations • 1 Aug 2022 • Jiahui Fu, Yilun Du, Kurran Singh, Joshua B. Tenenbaum, John J. Leonard
The ability to reason about changes in the environment is crucial for robots operating over extended periods of time.
no code implementations • 20 Oct 2022 • Shuang Li, Yilun Du, Joshua B. Tenenbaum, Antonio Torralba, Igor Mordatch
Such closed-loop communication enables models to correct errors caused by other models, significantly boosting performance on downstream tasks, e. g. improving accuracy on grade school math problems by 7. 5%, without requiring any model finetuning.
Ranked #1 on Video Question Answering on ActivityNet-QA
no code implementations • 8 Nov 2022 • Weiyu Liu, Yilun Du, Tucker Hermans, Sonia Chernova, Chris Paxton
StructDiffusion even improves the success rate of assembling physically-valid structures out of unseen objects by on average 16% over an existing multi-modal transformer model trained on specific structures.
no code implementations • 8 Nov 2022 • Robin Strudel, Corentin Tallec, Florent Altché, Yilun Du, Yaroslav Ganin, Arthur Mensch, Will Grathwohl, Nikolay Savinov, Sander Dieleman, Laurent SIfre, Rémi Leblond
Can continuous diffusion models bring the same performance breakthrough on natural language they did for image generation?
1 code implementation • 17 Nov 2022 • Anthony Simeonov, Yilun Du, Lin Yen-Chen, Alberto Rodriguez, Leslie Pack Kaelbling, Tomas Lozano-Perez, Pulkit Agrawal
This formalism is implemented in three steps: assigning a consistent local coordinate frame to the task-relevant object parts, determining the location and orientation of this coordinate frame on unseen object instances, and executing an action that brings these frames into the desired alignment.
no code implementations • 28 Nov 2022 • Anurag Ajay, Yilun Du, Abhi Gupta, Joshua Tenenbaum, Tommi Jaakkola, Pulkit Agrawal
We further demonstrate the advantages of modeling policies as conditional diffusion models by considering two other conditioning variables: constraints and skills.
no code implementations • 7 Feb 2023 • Ethan Chun, Yilun Du, Anthony Simeonov, Tomas Lozano-Perez, Leslie Kaelbling
A robot operating in a household environment will see a wide range of unique and unfamiliar objects.
2 code implementations • 22 Feb 2023 • Yilun Du, Conor Durkan, Robin Strudel, Joshua B. Tenenbaum, Sander Dieleman, Rob Fergus, Jascha Sohl-Dickstein, Arnaud Doucet, Will Grathwohl
In this work, we build upon these ideas using the score-based interpretation of diffusion models, and explore alternative ways to condition, modify, and reuse diffusion models for tasks involving compositional generation and guidance.
no code implementations • 7 Mar 2023 • Sherry Yang, Ofir Nachum, Yilun Du, Jason Wei, Pieter Abbeel, Dale Schuurmans
In response to these developments, new paradigms are emerging for training foundation models to interact with other agents and perform long-term reasoning.
no code implementations • 13 Mar 2023 • Jiahui Fu, Yilun Du, Kurran Singh, Joshua B. Tenenbaum, John J. Leonard
We present NeuSE, a novel Neural SE(3)-Equivariant Embedding for objects, and illustrate how it supports object SLAM for consistent spatial understanding with long-term scene changes.
no code implementations • CVPR 2023 • Yining Hong, Chunru Lin, Yilun Du, Zhenfang Chen, Joshua B. Tenenbaum, Chuang Gan
We suggest that a principled approach for 3D reasoning from multi-view images should be to infer a compact 3D representation of the world from the multi-view images, which is further grounded on open-vocabulary semantic concepts, and then to execute reasoning on these 3D representations.
no code implementations • 28 Mar 2023 • Hongyi Chen, Yilun Du, Yiye Chen, Joshua Tenenbaum, Patricio A. Vela
In this paper, we suggest an approach towards integrating planning with sequence models based on the idea of iterative energy minimization, and illustrate how such a procedure leads to improved RL performance across different tasks.
1 code implementation • CVPR 2023 • Yilun Du, Cameron Smith, Ayush Tewari, Vincent Sitzmann
We conduct extensive comparisons on held-out test scenes across two real-world datasets, significantly outperforming prior work on novel view synthesis from sparse image observations and achieving multi-view-consistent novel view synthesis.
2 code implementations • 22 May 2023 • Kevin Black, Michael Janner, Yilun Du, Ilya Kostrikov, Sergey Levine
However, most use cases of diffusion models are not concerned with likelihoods, but instead with downstream objectives such as human-perceived image quality or drug effectiveness.
1 code implementation • 23 May 2023 • Yilun Du, Shuang Li, Antonio Torralba, Joshua B. Tenenbaum, Igor Mordatch
Our findings indicate that this approach significantly enhances mathematical and strategic reasoning across a number of tasks.
no code implementations • 2 Jun 2023 • Mengjiao Yang, Yilun Du, Bo Dai, Dale Schuurmans, Joshua B. Tenenbaum, Pieter Abbeel
Large text-to-video models trained on internet-scale data have demonstrated exceptional capabilities in generating high-fidelity videos from arbitrary textual descriptions.
no code implementations • ICCV 2023 • Nan Liu, Yilun Du, Shuang Li, Joshua B. Tenenbaum, Antonio Torralba
Text-to-image generative models have enabled high-resolution image synthesis across different domains, but require users to specify the content they wish to generate.
1 code implementation • 5 Jul 2023 • Hongxin Zhang, Weihua Du, Jiaming Shan, Qinhong Zhou, Yilun Du, Joshua B. Tenenbaum, Tianmin Shu, Chuang Gan
In this work, we address challenging multi-agent cooperation problems with decentralized control, raw sensory observations, costly communication, and multi-objective tasks instantiated in various embodied environments.
5 code implementations • NeurIPS 2023 • Yining Hong, Haoyu Zhen, Peihao Chen, Shuhong Zheng, Yilun Du, Zhenfang Chen, Chuang Gan
Furthermore, experiments on our held-in datasets for 3D captioning, task composition, and 3D-assisted dialogue show that our model outperforms 2D VLMs.
no code implementations • 2 Sep 2023 • Zhutian Yang, Jiayuan Mao, Yilun Du, Jiajun Wu, Joshua B. Tenenbaum, Tomás Lozano-Pérez, Leslie Pack Kaelbling
This paper introduces an approach for learning to solve continuous constraint satisfaction problems (CCSP) in robotic reasoning and planning.
no code implementations • 9 Oct 2023 • Mengjiao Yang, Yilun Du, Kamyar Ghasemipour, Jonathan Tompson, Leslie Kaelbling, Dale Schuurmans, Pieter Abbeel
Applications of a real-world simulator range from controllable content creation in games and movies, to training embodied agents purely in simulation that can be directly deployed in the real world.
no code implementations • 12 Oct 2023 • Po-Chen Ko, Jiayuan Mao, Yilun Du, Shao-Hua Sun, Joshua B. Tenenbaum
In this work, we present an approach to construct a video-based robot policy capable of reliably executing diverse tasks across different robots and environments from few video demonstrations without using any action annotations.
no code implementations • 16 Oct 2023 • Yilun Du, Mengjiao Yang, Pete Florence, Fei Xia, Ayzaan Wahid, Brian Ichter, Pierre Sermanet, Tianhe Yu, Pieter Abbeel, Joshua B. Tenenbaum, Leslie Kaelbling, Andy Zeng, Jonathan Tompson
We are interested in enabling visual planning for complex long-horizon tasks in the space of generated videos and language, leveraging recent advances in large generative models pretrained on Internet-scale data.
no code implementations • 23 Oct 2023 • Armand Comas-Massagué, Yilun Du, Christian Fernandez, Sandesh Ghimire, Mario Sznaier, Joshua B. Tenenbaum, Octavia Camps
In this work, we propose Neural Interaction Inference with Potentials (NIIP) as an alternative approach to discover such interactions that enables greater flexibility in trajectory modeling: it discovers a set of relational potentials, represented as energy functions, which when minimized reconstruct the original trajectory.
no code implementations • 20 Jan 2024 • Yinan Zhang, Eric Tzeng, Yilun Du, Dmitry Kislyuk
Text-to-image diffusion models are a class of deep generative models that have demonstrated an impressive capacity for high-quality image generation.
1 code implementation • 23 Jan 2024 • Qinhong Zhou, Sunli Chen, Yisong Wang, Haozhe Xu, Weihua Du, Hongxin Zhang, Yilun Du, Joshua B. Tenenbaum, Chuang Gan
Recent advances in high-fidelity virtual environments serve as one of the major driving forces for building intelligent embodied agents to perceive, reason and interact with the physical world.
1 code implementation • 24 Jan 2024 • Tailin Wu, Takashi Maruyama, Long Wei, Tao Zhang, Yilun Du, Gianluca Iaccarino, Jure Leskovec
In an N-body interaction task and a challenging 2D multi-airfoil design task, we demonstrate that by composing the learned diffusion model at test time, our method allows us to design initial states and boundary shapes that are more complex than those in the training data.
no code implementations • 2 Feb 2024 • Yilun Du, Leslie Kaelbling
Large monolithic generative models trained on massive amounts of data have become an increasingly dominant approach in AI research.
no code implementations • 4 Feb 2024 • Lirui Wang, Jialiang Zhao, Yilun Du, Edward H. Adelson, Russ Tedrake
Training general robotic policies from heterogeneous data for different tasks is a significant challenge.
no code implementations • 15 Feb 2024 • Rylan Schaeffer, Nika Zahedi, Mikail Khona, Dhruv Pai, Sang Truong, Yilun Du, Mitchell Ostrow, Sarthak Chandra, Andres Carranza, Ila Rani Fiete, Andrey Gromov, Sanmi Koyejo
Based on the observation that associative memory's energy functions can be seen as probabilistic modeling's negative log likelihoods, we build a bridge between the two that enables useful flow of ideas in both directions.
no code implementations • 27 Feb 2024 • Sherry Yang, Jacob Walker, Jack Parker-Holder, Yilun Du, Jake Bruce, Andre Barreto, Pieter Abbeel, Dale Schuurmans
Moreover, we demonstrate how, like language models, video generation can serve as planners, agents, compute engines, and environment simulators through techniques such as in-context learning, planning and reinforcement learning.
no code implementations • 14 Mar 2024 • Haoyu Zhen, Xiaowen Qiu, Peihao Chen, Jincheng Yang, Xin Yan, Yilun Du, Yining Hong, Chuang Gan
Recent vision-language-action (VLA) models rely on 2D inputs, lacking integration with the broader realm of the 3D physical world.
no code implementations • 16 Apr 2024 • Hongxin Zhang, Zeyuan Wang, Qiushi Lyu, Zheyuan Zhang, Sunli Chen, Tianmin Shu, Yilun Du, Chuang Gan
In this paper, we investigate the problem of embodied multi-agent cooperation, where decentralized agents must cooperate given only partial egocentric views of the world.
no code implementations • 30 Apr 2024 • Robert McCarthy, Daniel C. H. Tan, Dominik Schmidt, Fernando Acero, Nathan Herr, Yilun Du, Thomas G. Thuruthel, Zhibin Li
This includes a discussion of the exciting benefits LfV methods can offer (e. g., improved generalization beyond the available robot data) and commentary on key LfV challenges (e. g., challenges related to missing information in video and LfV distribution shifts).