Search Results for author: Yilun Du

Found 74 papers, 29 papers with code

Planning with Diffusion for Flexible Behavior Synthesis

2 code implementations20 May 2022 Michael Janner, Yilun Du, Joshua B. Tenenbaum, Sergey Levine

Model-based reinforcement learning methods often use learning only for the purpose of estimating an approximate dynamics model, offloading the rest of the decision-making work to classical trajectory optimizers.

Decision Making Denoising +2

Neural MMO: A Massively Multiagent Game Environment for Training and Evaluating Intelligent Agents

1 code implementation2 Mar 2019 Joseph Suarez, Yilun Du, Phillip Isola, Igor Mordatch

The emergence of complex life on Earth is often attributed to the arms race that ensued from a huge number of organisms all competing for finite resources.

Compositional Visual Generation with Composable Diffusion Models

1 code implementation3 Jun 2022 Nan Liu, Shuang Li, Yilun Du, Antonio Torralba, Joshua B. Tenenbaum

Large text-guided diffusion models, such as DALLE-2, are able to generate stunning photorealistic images given natural language descriptions.

Sentence

Pre-Trained Language Models for Interactive Decision-Making

1 code implementation3 Feb 2022 Shuang Li, Xavier Puig, Chris Paxton, Yilun Du, Clinton Wang, Linxi Fan, Tao Chen, De-An Huang, Ekin Akyürek, Anima Anandkumar, Jacob Andreas, Igor Mordatch, Antonio Torralba, Yuke Zhu

Together, these results suggest that language modeling induces representations that are useful for modeling not just language, but also goals and plans; these representations can aid learning and generalization even outside of language processing.

Imitation Learning Language Modelling

Implicit Generation and Generalization in Energy-Based Models

3 code implementations20 Mar 2019 Yilun Du, Igor Mordatch

Energy based models (EBMs) are appealing due to their generality and simplicity in likelihood modeling, but have been traditionally difficult to train.

General Classification Image Reconstruction +1

Implicit Generation and Modeling with Energy Based Models

1 code implementation NeurIPS 2019 Yilun Du, Igor Mordatch

Energy based models (EBMs) are appealing due to their generality and simplicity in likelihood modeling, but have been traditionally difficult to train.

General Classification Image Generation +2

Neural Radiance Flow for 4D View Synthesis and Video Processing

1 code implementation ICCV 2021 Yilun Du, Yinan Zhang, Hong-Xing Yu, Joshua B. Tenenbaum, Jiajun Wu

We present a method, Neural Radiance Flow (NeRFlow), to learn a 4D spatial-temporal representation of a dynamic scene from a set of RGB images.

Image Super-Resolution Temporal View Synthesis

Training Diffusion Models with Reinforcement Learning

2 code implementations22 May 2023 Kevin Black, Michael Janner, Yilun Du, Ilya Kostrikov, Sergey Levine

However, most use cases of diffusion models are not concerned with likelihoods, but instead with downstream objectives such as human-perceived image quality or drug effectiveness.

Decision Making Denoising +2

Improving Factuality and Reasoning in Language Models through Multiagent Debate

1 code implementation23 May 2023 Yilun Du, Shuang Li, Antonio Torralba, Joshua B. Tenenbaum, Igor Mordatch

Our findings indicate that this approach significantly enhances mathematical and strategic reasoning across a number of tasks.

Few-Shot Learning Language Modelling +1

Building Cooperative Embodied Agents Modularly with Large Language Models

1 code implementation5 Jul 2023 Hongxin Zhang, Weihua Du, Jiaming Shan, Qinhong Zhou, Yilun Du, Joshua B. Tenenbaum, Tianmin Shu, Chuang Gan

In this work, we address challenging multi-agent cooperation problems with decentralized control, raw sensory observations, costly communication, and multi-objective tasks instantiated in various embodied environments.

Text Generation

Learning to Render Novel Views from Wide-Baseline Stereo Pairs

1 code implementation CVPR 2023 Yilun Du, Cameron Smith, Ayush Tewari, Vincent Sitzmann

We conduct extensive comparisons on held-out test scenes across two real-world datasets, significantly outperforming prior work on novel view synthesis from sparse image observations and achieving multi-view-consistent novel view synthesis.

Novel View Synthesis

Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC

2 code implementations22 Feb 2023 Yilun Du, Conor Durkan, Robin Strudel, Joshua B. Tenenbaum, Sander Dieleman, Rob Fergus, Jascha Sohl-Dickstein, Arnaud Doucet, Will Grathwohl

In this work, we build upon these ideas using the score-based interpretation of diffusion models, and explore alternative ways to condition, modify, and reuse diffusion models for tasks involving compositional generation and guidance.

Text-to-Image Generation

Learning Neural Acoustic Fields

1 code implementation4 Apr 2022 Andrew Luo, Yilun Du, Michael J. Tarr, Joshua B. Tenenbaum, Antonio Torralba, Chuang Gan

By modeling acoustic propagation in a scene as a linear time-invariant system, NAFs learn to continuously map all emitter and listener location pairs to a neural impulse response function that can then be applied to arbitrary sounds.

Improved Contrastive Divergence Training of Energy Based Models

4 code implementations2 Dec 2020 Yilun Du, Shuang Li, Joshua Tenenbaum, Igor Mordatch

Contrastive divergence is a popular method of training energy-based models, but is known to have difficulties with training stability.

Data Augmentation Image Generation +1

Unsupervised Learning of Compositional Energy Concepts

1 code implementation NeurIPS 2021 Yilun Du, Shuang Li, Yash Sharma, Joshua B. Tenenbaum, Igor Mordatch

In this work, we propose COMET, which discovers and represents concepts as separate energy functions, enabling us to represent both global concepts as well as objects under a unified framework.

Disentanglement Unsupervised Image Decomposition

Compositional Visual Generation and Inference with Energy Based Models

1 code implementation13 Apr 2020 Yilun Du, Shuang Li, Igor Mordatch

A vital aspect of human intelligence is the ability to compose increasingly complex concepts out of simpler ideas, enabling both rapid learning and adaptation of knowledge.

Learning Iterative Reasoning through Energy Minimization

1 code implementation30 Jun 2022 Yilun Du, Shuang Li, Joshua B. Tenenbaum, Igor Mordatch

Finally, we illustrate that our approach can recursively solve algorithmic problems requiring nested reasoning

Image Classification Object Recognition

Energy-Based Models for Continual Learning

1 code implementation24 Nov 2020 Shuang Li, Yilun Du, Gido M. van de Ven, Igor Mordatch

We motivate Energy-Based Models (EBMs) as a promising model class for continual learning problems.

Continual Learning

SE(3)-Equivariant Relational Rearrangement with Neural Descriptor Fields

1 code implementation17 Nov 2022 Anthony Simeonov, Yilun Du, Lin Yen-Chen, Alberto Rodriguez, Leslie Pack Kaelbling, Tomas Lozano-Perez, Pulkit Agrawal

This formalism is implemented in three steps: assigning a consistent local coordinate frame to the task-relevant object parts, determining the location and orientation of this coordinate frame on unseen object instances, and executing an action that brings these frames into the desired alignment.

Object

Weakly Supervised Human-Object Interaction Detection in Video via Contrastive Spatiotemporal Regions

1 code implementation ICCV 2021 Shuang Li, Yilun Du, Antonio Torralba, Josef Sivic, Bryan Russell

Our task poses unique challenges as a system does not know what types of human-object interactions are present in a video or the actual spatiotemporal location of the human and the object.

Human-Object Interaction Detection Object +2

HAZARD Challenge: Embodied Decision Making in Dynamically Changing Environments

1 code implementation23 Jan 2024 Qinhong Zhou, Sunli Chen, Yisong Wang, Haozhe Xu, Weihua Du, Hongxin Zhang, Yilun Du, Joshua B. Tenenbaum, Chuang Gan

Recent advances in high-fidelity virtual environments serve as one of the major driving forces for building intelligent embodied agents to perceive, reason and interact with the physical world.

Common Sense Reasoning Decision Making +1

Task-Agnostic Dynamics Priors for Deep Reinforcement Learning

1 code implementation13 May 2019 Yilun Du, Karthik Narasimhan

While model-based deep reinforcement learning (RL) holds great promise for sample efficiency and generalization, learning an accurate dynamics model is often challenging and requires substantial interaction with the environment.

reinforcement-learning Reinforcement Learning (RL)

Compositional Generative Inverse Design

1 code implementation24 Jan 2024 Tailin Wu, Takashi Maruyama, Long Wei, Tao Zhang, Yilun Du, Gianluca Iaccarino, Jure Leskovec

In an N-body interaction task and a challenging 2D multi-airfoil design task, we demonstrate that by composing the learned diffusion model at test time, our method allows us to design initial states and boundary shapes that are more complex than those in the training data.

Neural Descriptor Fields: SE(3)-Equivariant Object Representations for Manipulation

1 code implementation9 Dec 2021 Anthony Simeonov, Yilun Du, Andrea Tagliasacchi, Joshua B. Tenenbaum, Alberto Rodriguez, Pulkit Agrawal, Vincent Sitzmann

Our performance generalizes across both object instances and 6-DoF object poses, and significantly outperforms a recent baseline that relies on 2D descriptors.

Object

Learning to Exploit Stability for 3D Scene Parsing

no code implementations NeurIPS 2018 Yilun Du, Zhijian Liu, Hector Basevi, Ales Leonardis, Bill Freeman, Josh Tenenbaum, Jiajun Wu

We first show that applying physics supervision to an existing scene understanding model increases performance, produces more stable predictions, and allows training to an equivalent performance level with fewer annotated training examples.

Scene Understanding Translation

Neural MMO: A massively multiplayer game environment for intelligent agents

no code implementations ICLR 2019 Joseph Suarez, Yilun Du, Phillip Isola, Igor Mordatch

We demonstrate how this platform can be used to study behavior and learning in large populations of neural agents.

An Empirical Study on Hyperparameters and their Interdependence for RL Generalization

no code implementations2 Jun 2019 Xingyou Song, Yilun Du, Jacob Jackson

Recent results in Reinforcement Learning (RL) have shown that agents with limited training environments are susceptible to a large amount of overfitting across many domains.

reinforcement-learning Reinforcement Learning (RL)

Model Based Planning with Energy Based Models

no code implementations15 Sep 2019 Yilun Du, Toru Lin, Igor Mordatch

We provide an online algorithm to train EBMs while interacting with the environment, and show that EBMs allow for significantly better online learning than corresponding feed-forward networks.

Reinforcement Learning (RL)

Observational Overfitting in Reinforcement Learning

no code implementations ICLR 2020 Xingyou Song, Yiding Jiang, Stephen Tu, Yilun Du, Behnam Neyshabur

A major component of overfitting in model-free reinforcement learning (RL) involves the case where the agent may mistakenly correlate reward with certain spurious features from the observations generated by the Markov Decision Process (MDP).

reinforcement-learning Reinforcement Learning (RL)

Neural MMO v1.3: A Massively Multiagent Game Environment for Training and Evaluating Neural Networks

no code implementations31 Jan 2020 Joseph Suarez, Yilun Du, Igor Mordatch, Phillip Isola

We present Neural MMO, a massively multiagent game environment inspired by MMOs and discuss our progress on two more general challenges in multiagent systems engineering for AI research: distributed infrastructure and game IO.

Policy Gradient Methods

Learning Object-Based State Estimators for Household Robots

no code implementations6 Nov 2020 Yilun Du, Tomas Lozano-Perez, Leslie Kaelbling

The robot may be called upon later to retrieve objects and will need a long-term object-based memory in order to know how to find them.

Clustering Object +2

Compositional Visual Generation with Energy Based Models

no code implementations NeurIPS 2020 Yilun Du, Shuang Li, Igor Mordatch

A vital aspect of human intelligence is the ability to compose increasingly complex concepts out of simpler ideas, enabling both rapid learning and adaptation of knowledge.

A Long Horizon Planning Framework for Manipulating Rigid Pointcloud Objects

no code implementations16 Nov 2020 Anthony Simeonov, Yilun Du, Beomjoon Kim, Francois R. Hogan, Joshua Tenenbaum, Pulkit Agrawal, Alberto Rodriguez

We present a framework for solving long-horizon planning problems involving manipulation of rigid objects that operates directly from a point-cloud observation, i. e. without prior object models.

Graph Attention Motion Planning +2

Language Model Pre-training Improves Generalization in Policy Learning

no code implementations29 Sep 2021 Shuang Li, Xavier Puig, Yilun Du, Ekin Akyürek, Antonio Torralba, Jacob Andreas, Igor Mordatch

Additional experiments explore the role of language-based encodings in these results; we find that it is possible to train a simple adapter layer that maps from observations and action histories to LM embeddings, and thus that language modeling provides an effective initializer even for tasks with no language as input or output.

Imitation Learning Language Modelling

The Neural MMO Platform for Massively Multiagent Research

no code implementations14 Oct 2021 Joseph Suarez, Yilun Du, Clare Zhu, Igor Mordatch, Phillip Isola

Neural MMO is a computationally accessible research platform that combines large agent populations, long time horizons, open-ended tasks, and modular game systems.

Learning Signal-Agnostic Manifolds of Neural Fields

no code implementations NeurIPS 2021 Yilun Du, Katherine M. Collins, Joshua B. Tenenbaum, Vincent Sitzmann

We leverage neural fields to capture the underlying structure in image, shape, audio and cross-modal audiovisual domains in a modality-independent manner.

Learning to Compose Visual Relations

no code implementations NeurIPS 2021 Nan Liu, Shuang Li, Yilun Du, Joshua B. Tenenbaum, Antonio Torralba

The visual world around us can be described as a structured set of objects and their associated relations.

Learning Physics Priors for Deep Reinforcement Learing

no code implementations27 Sep 2018 Yilun Du, Karthik Narasimhan

While model-based deep reinforcement learning (RL) holds great promise for sample efficiency and generalization, learning an accurate dynamics model is challenging and often requires substantial interactions with the environment.

Reinforcement Learning (RL) Transfer Learning

Learning Online Data Association

no code implementations28 Sep 2020 Yilun Du, Joshua B. Tenenbaum, Tomas Perez, Leslie Pack Kaelbling

When an agent interacts with a complex environment, it receives a stream of percepts in which it may detect entities, such as objects or people.

Representation Learning

Streaming Inference for Infinite Non-Stationary Clustering

no code implementations2 May 2022 Rylan Schaeffer, Gabrielle Kaili-May Liu, Yilun Du, Scott Linderman, Ila Rani Fiete

Learning from a continuous stream of non-stationary data in an unsupervised manner is arguably one of the most common and most challenging settings facing intelligent agents.

Clustering Variational Inference

3D Concept Grounding on Neural Fields

no code implementations13 Jul 2022 Yining Hong, Yilun Du, Chunru Lin, Joshua B. Tenenbaum, Chuang Gan

Experimental results show that our proposed framework outperforms unsupervised/language-mediated segmentation models on semantic and instance segmentation tasks, as well as outperforms existing models on the challenging 3D aware visual reasoning tasks.

Instance Segmentation Question Answering +3

Neural Groundplans: Persistent Neural Scene Representations from a Single Image

no code implementations22 Jul 2022 Prafull Sharma, Ayush Tewari, Yilun Du, Sergey Zakharov, Rares Ambrus, Adrien Gaidon, William T. Freeman, Fredo Durand, Joshua B. Tenenbaum, Vincent Sitzmann

We present a method to map 2D image observations of a scene to a persistent 3D scene representation, enabling novel view synthesis and disentangled representation of the movable and immovable components of the scene.

Disentanglement Instance Segmentation +4

Robust Change Detection Based on Neural Descriptor Fields

no code implementations1 Aug 2022 Jiahui Fu, Yilun Du, Kurran Singh, Joshua B. Tenenbaum, John J. Leonard

The ability to reason about changes in the environment is crucial for robots operating over extended periods of time.

Change Detection Object

Composing Ensembles of Pre-trained Models via Iterative Consensus

no code implementations20 Oct 2022 Shuang Li, Yilun Du, Joshua B. Tenenbaum, Antonio Torralba, Igor Mordatch

Such closed-loop communication enables models to correct errors caused by other models, significantly boosting performance on downstream tasks, e. g. improving accuracy on grade school math problems by 7. 5%, without requiring any model finetuning.

Arithmetic Reasoning Image Generation +4

StructDiffusion: Language-Guided Creation of Physically-Valid Structures using Unseen Objects

no code implementations8 Nov 2022 Weiyu Liu, Yilun Du, Tucker Hermans, Sonia Chernova, Chris Paxton

StructDiffusion even improves the success rate of assembling physically-valid structures out of unseen objects by on average 16% over an existing multi-modal transformer model trained on specific structures.

valid

Is Conditional Generative Modeling all you need for Decision-Making?

no code implementations28 Nov 2022 Anurag Ajay, Yilun Du, Abhi Gupta, Joshua Tenenbaum, Tommi Jaakkola, Pulkit Agrawal

We further demonstrate the advantages of modeling policies as conditional diffusion models by considering two other conditioning variables: constraints and skills.

Decision Making Offline RL +1

Foundation Models for Decision Making: Problems, Methods, and Opportunities

no code implementations7 Mar 2023 Sherry Yang, Ofir Nachum, Yilun Du, Jason Wei, Pieter Abbeel, Dale Schuurmans

In response to these developments, new paradigms are emerging for training foundation models to interact with other agents and perform long-term reasoning.

Autonomous Driving Decision Making +1

NeuSE: Neural SE(3)-Equivariant Embedding for Consistent Spatial Understanding with Objects

no code implementations13 Mar 2023 Jiahui Fu, Yilun Du, Kurran Singh, Joshua B. Tenenbaum, John J. Leonard

We present NeuSE, a novel Neural SE(3)-Equivariant Embedding for objects, and illustrate how it supports object SLAM for consistent spatial understanding with long-term scene changes.

Object Object SLAM

3D Concept Learning and Reasoning from Multi-View Images

no code implementations CVPR 2023 Yining Hong, Chunru Lin, Yilun Du, Zhenfang Chen, Joshua B. Tenenbaum, Chuang Gan

We suggest that a principled approach for 3D reasoning from multi-view images should be to infer a compact 3D representation of the world from the multi-view images, which is further grounded on open-vocabulary semantic concepts, and then to execute reasoning on these 3D representations.

Question Answering Visual Question Answering +1

Planning with Sequence Models through Iterative Energy Minimization

no code implementations28 Mar 2023 Hongyi Chen, Yilun Du, Yiye Chen, Joshua Tenenbaum, Patricio A. Vela

In this paper, we suggest an approach towards integrating planning with sequence models based on the idea of iterative energy minimization, and illustrate how such a procedure leads to improved RL performance across different tasks.

Language Modelling Reinforcement Learning (RL)

Probabilistic Adaptation of Text-to-Video Models

no code implementations2 Jun 2023 Mengjiao Yang, Yilun Du, Bo Dai, Dale Schuurmans, Joshua B. Tenenbaum, Pieter Abbeel

Large text-to-video models trained on internet-scale data have demonstrated exceptional capabilities in generating high-fidelity videos from arbitrary textual descriptions.

Language Modelling Large Language Model

Unsupervised Compositional Concepts Discovery with Text-to-Image Generative Models

no code implementations ICCV 2023 Nan Liu, Yilun Du, Shuang Li, Joshua B. Tenenbaum, Antonio Torralba

Text-to-image generative models have enabled high-resolution image synthesis across different domains, but require users to specify the content they wish to generate.

Image Generation

Compositional Diffusion-Based Continuous Constraint Solvers

no code implementations2 Sep 2023 Zhutian Yang, Jiayuan Mao, Yilun Du, Jiajun Wu, Joshua B. Tenenbaum, Tomás Lozano-Pérez, Leslie Pack Kaelbling

This paper introduces an approach for learning to solve continuous constraint satisfaction problems (CCSP) in robotic reasoning and planning.

Learning Interactive Real-World Simulators

no code implementations9 Oct 2023 Mengjiao Yang, Yilun Du, Kamyar Ghasemipour, Jonathan Tompson, Leslie Kaelbling, Dale Schuurmans, Pieter Abbeel

Applications of a real-world simulator range from controllable content creation in games and movies, to training embodied agents purely in simulation that can be directly deployed in the real world.

Video Captioning

Learning to Act from Actionless Videos through Dense Correspondences

no code implementations12 Oct 2023 Po-Chen Ko, Jiayuan Mao, Yilun Du, Shao-Hua Sun, Joshua B. Tenenbaum

In this work, we present an approach to construct a video-based robot policy capable of reliably executing diverse tasks across different robots and environments from few video demonstrations without using any action annotations.

Video Language Planning

no code implementations16 Oct 2023 Yilun Du, Mengjiao Yang, Pete Florence, Fei Xia, Ayzaan Wahid, Brian Ichter, Pierre Sermanet, Tianhe Yu, Pieter Abbeel, Joshua B. Tenenbaum, Leslie Kaelbling, Andy Zeng, Jonathan Tompson

We are interested in enabling visual planning for complex long-horizon tasks in the space of generated videos and language, leveraging recent advances in large generative models pretrained on Internet-scale data.

Inferring Relational Potentials in Interacting Systems

no code implementations23 Oct 2023 Armand Comas-Massagué, Yilun Du, Christian Fernandez, Sandesh Ghimire, Mario Sznaier, Joshua B. Tenenbaum, Octavia Camps

In this work, we propose Neural Interaction Inference with Potentials (NIIP) as an alternative approach to discover such interactions that enables greater flexibility in trajectory modeling: it discovers a set of relational potentials, represented as energy functions, which when minimized reconstruct the original trajectory.

Trajectory Forecasting Trajectory Modeling

Large-scale Reinforcement Learning for Diffusion Models

no code implementations20 Jan 2024 Yinan Zhang, Eric Tzeng, Yilun Du, Dmitry Kislyuk

Text-to-image diffusion models are a class of deep generative models that have demonstrated an impressive capacity for high-quality image generation.

Ethics Fairness +3

Compositional Generative Modeling: A Single Model is Not All You Need

no code implementations2 Feb 2024 Yilun Du, Leslie Kaelbling

Large monolithic generative models trained on massive amounts of data have become an increasingly dominant approach in AI research.

PoCo: Policy Composition from and for Heterogeneous Robot Learning

no code implementations4 Feb 2024 Lirui Wang, Jialiang Zhao, Yilun Du, Edward H. Adelson, Russ Tedrake

Training general robotic policies from heterogeneous data for different tasks is a significant challenge.

Bridging Associative Memory and Probabilistic Modeling

no code implementations15 Feb 2024 Rylan Schaeffer, Nika Zahedi, Mikail Khona, Dhruv Pai, Sang Truong, Yilun Du, Mitchell Ostrow, Sarthak Chandra, Andres Carranza, Ila Rani Fiete, Andrey Gromov, Sanmi Koyejo

Based on the observation that associative memory's energy functions can be seen as probabilistic modeling's negative log likelihoods, we build a bridge between the two that enables useful flow of ideas in both directions.

In-Context Learning

Video as the New Language for Real-World Decision Making

no code implementations27 Feb 2024 Sherry Yang, Jacob Walker, Jack Parker-Holder, Yilun Du, Jake Bruce, Andre Barreto, Pieter Abbeel, Dale Schuurmans

Moreover, we demonstrate how, like language models, video generation can serve as planners, agents, compute engines, and environment simulators through techniques such as in-context learning, planning and reinforcement learning.

Decision Making In-Context Learning +2

3D-VLA: A 3D Vision-Language-Action Generative World Model

no code implementations14 Mar 2024 Haoyu Zhen, Xiaowen Qiu, Peihao Chen, Jincheng Yang, Xin Yan, Yilun Du, Yining Hong, Chuang Gan

Recent vision-language-action (VLA) models rely on 2D inputs, lacking integration with the broader realm of the 3D physical world.

Language Modelling Large Language Model +1

COMBO: Compositional World Models for Embodied Multi-Agent Cooperation

no code implementations16 Apr 2024 Hongxin Zhang, Zeyuan Wang, Qiushi Lyu, Zheyuan Zhang, Sunli Chen, Tianmin Shu, Yilun Du, Chuang Gan

In this paper, we investigate the problem of embodied multi-agent cooperation, where decentralized agents must cooperate given only partial egocentric views of the world.

Cannot find the paper you are looking for? You can Submit a new open access paper.