Search Results for author: Yilun Du

Found 64 papers, 23 papers with code

Compositional Diffusion-Based Continuous Constraint Solvers

no code implementations2 Sep 2023 Zhutian Yang, Jiayuan Mao, Yilun Du, Jiajun Wu, Joshua B. Tenenbaum, Tomás Lozano-Pérez, Leslie Pack Kaelbling

This paper introduces an approach for learning to solve continuous constraint satisfaction problems (CCSP) in robotic reasoning and planning.

3D-LLM: Injecting the 3D World into Large Language Models

no code implementations24 Jul 2023 Yining Hong, Haoyu Zhen, Peihao Chen, Shuhong Zheng, Yilun Du, Zhenfang Chen, Chuang Gan

Furthermore, experiments on our held-in datasets for 3D captioning, task composition, and 3D-assisted dialogue show that our model outperforms 2D VLMs.

Dense Captioning Question Answering

Building Cooperative Embodied Agents Modularly with Large Language Models

no code implementations5 Jul 2023 Hongxin Zhang, Weihua Du, Jiaming Shan, Qinhong Zhou, Yilun Du, Joshua B. Tenenbaum, Tianmin Shu, Chuang Gan

Large Language Models (LLMs) have demonstrated impressive planning abilities in single-agent embodied tasks across various domains.

Unsupervised Compositional Concepts Discovery with Text-to-Image Generative Models

no code implementations8 Jun 2023 Nan Liu, Yilun Du, Shuang Li, Joshua B. Tenenbaum, Antonio Torralba

Text-to-image generative models have enabled high-resolution image synthesis across different domains, but require users to specify the content they wish to generate.

Image Generation

Probabilistic Adaptation of Text-to-Video Models

no code implementations2 Jun 2023 Mengjiao Yang, Yilun Du, Bo Dai, Dale Schuurmans, Joshua B. Tenenbaum, Pieter Abbeel

Large text-to-video models trained on internet-scale data have demonstrated exceptional capabilities in generating high-fidelity videos from arbitrary textual descriptions.

Language Modelling Large Language Model

Improving Factuality and Reasoning in Language Models through Multiagent Debate

1 code implementation23 May 2023 Yilun Du, Shuang Li, Antonio Torralba, Joshua B. Tenenbaum, Igor Mordatch

Our findings indicate that this approach significantly enhances mathematical and strategic reasoning across a number of tasks.

Few-Shot Learning Language Modelling +1

Training Diffusion Models with Reinforcement Learning

no code implementations22 May 2023 Kevin Black, Michael Janner, Yilun Du, Ilya Kostrikov, Sergey Levine

Diffusion models are a class of flexible generative models trained with an approximation to the log-likelihood objective.

Decision Making Denoising +2

Learning to Render Novel Views from Wide-Baseline Stereo Pairs

1 code implementation CVPR 2023 Yilun Du, Cameron Smith, Ayush Tewari, Vincent Sitzmann

We conduct extensive comparisons on held-out test scenes across two real-world datasets, significantly outperforming prior work on novel view synthesis from sparse image observations and achieving multi-view-consistent novel view synthesis.

Novel View Synthesis

Planning with Sequence Models through Iterative Energy Minimization

no code implementations28 Mar 2023 Hongyi Chen, Yilun Du, Yiye Chen, Joshua Tenenbaum, Patricio A. Vela

In this paper, we suggest an approach towards integrating planning with sequence models based on the idea of iterative energy minimization, and illustrate how such a procedure leads to improved RL performance across different tasks.

Language Modelling Reinforcement Learning (RL)

3D Concept Learning and Reasoning from Multi-View Images

no code implementations CVPR 2023 Yining Hong, Chunru Lin, Yilun Du, Zhenfang Chen, Joshua B. Tenenbaum, Chuang Gan

We suggest that a principled approach for 3D reasoning from multi-view images should be to infer a compact 3D representation of the world from the multi-view images, which is further grounded on open-vocabulary semantic concepts, and then to execute reasoning on these 3D representations.

Question Answering Visual Question Answering +1

NeuSE: Neural SE(3)-Equivariant Embedding for Consistent Spatial Understanding with Objects

no code implementations13 Mar 2023 Jiahui Fu, Yilun Du, Kurran Singh, Joshua B. Tenenbaum, John J. Leonard

We present NeuSE, a novel Neural SE(3)-Equivariant Embedding for objects, and illustrate how it supports object SLAM for consistent spatial understanding with long-term scene changes.

Object SLAM

Foundation Models for Decision Making: Problems, Methods, and Opportunities

no code implementations7 Mar 2023 Sherry Yang, Ofir Nachum, Yilun Du, Jason Wei, Pieter Abbeel, Dale Schuurmans

In response to these developments, new paradigms are emerging for training foundation models to interact with other agents and perform long-term reasoning.

Autonomous Driving Decision Making +1

Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC

2 code implementations22 Feb 2023 Yilun Du, Conor Durkan, Robin Strudel, Joshua B. Tenenbaum, Sander Dieleman, Rob Fergus, Jascha Sohl-Dickstein, Arnaud Doucet, Will Grathwohl

In this work, we build upon these ideas using the score-based interpretation of diffusion models, and explore alternative ways to condition, modify, and reuse diffusion models for tasks involving compositional generation and guidance.

Learning Universal Policies via Text-Guided Video Generation

no code implementations31 Jan 2023 Yilun Du, Mengjiao Yang, Bo Dai, Hanjun Dai, Ofir Nachum, Joshua B. Tenenbaum, Dale Schuurmans, Pieter Abbeel

The proposed policy-as-video formulation can further represent environments with different state and action spaces in a unified space of images, which, for example, enables learning and generalization across a variety of robot manipulation tasks.

Decision Making Image Generation +3

Is Conditional Generative Modeling all you need for Decision-Making?

no code implementations28 Nov 2022 Anurag Ajay, Yilun Du, Abhi Gupta, Joshua Tenenbaum, Tommi Jaakkola, Pulkit Agrawal

We further demonstrate the advantages of modeling policies as conditional diffusion models by considering two other conditioning variables: constraints and skills.

Decision Making Offline RL +1

SE(3)-Equivariant Relational Rearrangement with Neural Descriptor Fields

1 code implementation17 Nov 2022 Anthony Simeonov, Yilun Du, Lin Yen-Chen, Alberto Rodriguez, Leslie Pack Kaelbling, Tomas Lozano-Perez, Pulkit Agrawal

This formalism is implemented in three steps: assigning a consistent local coordinate frame to the task-relevant object parts, determining the location and orientation of this coordinate frame on unseen object instances, and executing an action that brings these frames into the desired alignment.

StructDiffusion: Language-Guided Creation of Physically-Valid Structures using Unseen Objects

no code implementations8 Nov 2022 Weiyu Liu, Yilun Du, Tucker Hermans, Sonia Chernova, Chris Paxton

StructDiffusion even improves the success rate of assembling physically-valid structures out of unseen objects by on average 16% over an existing multi-modal transformer model trained on specific structures.

Composing Ensembles of Pre-trained Models via Iterative Consensus

no code implementations20 Oct 2022 Shuang Li, Yilun Du, Joshua B. Tenenbaum, Antonio Torralba, Igor Mordatch

Such closed-loop communication enables models to correct errors caused by other models, significantly boosting performance on downstream tasks, e. g. improving accuracy on grade school math problems by 7. 5%, without requiring any model finetuning.

 Ranked #1 on Video Question Answering on ActivityNet-QA (Vocabulary Size metric)

Image Generation Mathematical Reasoning +2

Robust Change Detection Based on Neural Descriptor Fields

no code implementations1 Aug 2022 Jiahui Fu, Yilun Du, Kurran Singh, Joshua B. Tenenbaum, John J. Leonard

The ability to reason about changes in the environment is crucial for robots operating over extended periods of time.

Change Detection

Neural Groundplans: Persistent Neural Scene Representations from a Single Image

no code implementations22 Jul 2022 Prafull Sharma, Ayush Tewari, Yilun Du, Sergey Zakharov, Rares Ambrus, Adrien Gaidon, William T. Freeman, Fredo Durand, Joshua B. Tenenbaum, Vincent Sitzmann

We present a method to map 2D image observations of a scene to a persistent 3D scene representation, enabling novel view synthesis and disentangled representation of the movable and immovable components of the scene.

Disentanglement Instance Segmentation +4

3D Concept Grounding on Neural Fields

no code implementations13 Jul 2022 Yining Hong, Yilun Du, Chunru Lin, Joshua B. Tenenbaum, Chuang Gan

Experimental results show that our proposed framework outperforms unsupervised/language-mediated segmentation models on semantic and instance segmentation tasks, as well as outperforms existing models on the challenging 3D aware visual reasoning tasks.

Instance Segmentation Question Answering +2

Learning Iterative Reasoning through Energy Minimization

1 code implementation30 Jun 2022 Yilun Du, Shuang Li, Joshua B. Tenenbaum, Igor Mordatch

Finally, we illustrate that our approach can recursively solve algorithmic problems requiring nested reasoning

Image Classification Object Recognition

Compositional Visual Generation with Composable Diffusion Models

1 code implementation3 Jun 2022 Nan Liu, Shuang Li, Yilun Du, Antonio Torralba, Joshua B. Tenenbaum

Large text-guided diffusion models, such as DALLE-2, are able to generate stunning photorealistic images given natural language descriptions.

Planning with Diffusion for Flexible Behavior Synthesis

2 code implementations20 May 2022 Michael Janner, Yilun Du, Joshua B. Tenenbaum, Sergey Levine

Model-based reinforcement learning methods often use learning only for the purpose of estimating an approximate dynamics model, offloading the rest of the decision-making work to classical trajectory optimizers.

Decision Making Denoising +2

Streaming Inference for Infinite Non-Stationary Clustering

no code implementations2 May 2022 Rylan Schaeffer, Gabrielle Kaili-May Liu, Yilun Du, Scott Linderman, Ila Rani Fiete

Learning from a continuous stream of non-stationary data in an unsupervised manner is arguably one of the most common and most challenging settings facing intelligent agents.

Clustering Variational Inference

Learning Neural Acoustic Fields

no code implementations4 Apr 2022 Andrew Luo, Yilun Du, Michael J. Tarr, Joshua B. Tenenbaum, Antonio Torralba, Chuang Gan

By modeling acoustic propagation in a scene as a linear time-invariant system, NAFs learn to continuously map all emitter and listener location pairs to a neural impulse response function that can then be applied to arbitrary sounds.

Pre-Trained Language Models for Interactive Decision-Making

1 code implementation3 Feb 2022 Shuang Li, Xavier Puig, Chris Paxton, Yilun Du, Clinton Wang, Linxi Fan, Tao Chen, De-An Huang, Ekin Akyürek, Anima Anandkumar, Jacob Andreas, Igor Mordatch, Antonio Torralba, Yuke Zhu

Together, these results suggest that language modeling induces representations that are useful for modeling not just language, but also goals and plans; these representations can aid learning and generalization even outside of language processing.

Imitation Learning Language Modelling

Neural Descriptor Fields: SE(3)-Equivariant Object Representations for Manipulation

1 code implementation9 Dec 2021 Anthony Simeonov, Yilun Du, Andrea Tagliasacchi, Joshua B. Tenenbaum, Alberto Rodriguez, Pulkit Agrawal, Vincent Sitzmann

Our performance generalizes across both object instances and 6-DoF object poses, and significantly outperforms a recent baseline that relies on 2D descriptors.

Learning to Compose Visual Relations

no code implementations NeurIPS 2021 Nan Liu, Shuang Li, Yilun Du, Joshua B. Tenenbaum, Antonio Torralba

The visual world around us can be described as a structured set of objects and their associated relations.

Learning Signal-Agnostic Manifolds of Neural Fields

no code implementations NeurIPS 2021 Yilun Du, Katherine M. Collins, Joshua B. Tenenbaum, Vincent Sitzmann

We leverage neural fields to capture the underlying structure in image, shape, audio and cross-modal audiovisual domains in a modality-independent manner.

Unsupervised Learning of Compositional Energy Concepts

1 code implementation NeurIPS 2021 Yilun Du, Shuang Li, Yash Sharma, Joshua B. Tenenbaum, Igor Mordatch

In this work, we propose COMET, which discovers and represents concepts as separate energy functions, enabling us to represent both global concepts as well as objects under a unified framework.

Disentanglement Unsupervised Image Decomposition

The Neural MMO Platform for Massively Multiagent Research

no code implementations14 Oct 2021 Joseph Suarez, Yilun Du, Clare Zhu, Igor Mordatch, Phillip Isola

Neural MMO is a computationally accessible research platform that combines large agent populations, long time horizons, open-ended tasks, and modular game systems.

Weakly Supervised Human-Object Interaction Detection in Video via Contrastive Spatiotemporal Regions

1 code implementation ICCV 2021 Shuang Li, Yilun Du, Antonio Torralba, Josef Sivic, Bryan Russell

Our task poses unique challenges as a system does not know what types of human-object interactions are present in a video or the actual spatiotemporal location of the human and the object.

Human-Object Interaction Detection Weakly-supervised Learning

Language Model Pre-training Improves Generalization in Policy Learning

no code implementations29 Sep 2021 Shuang Li, Xavier Puig, Yilun Du, Ekin Akyürek, Antonio Torralba, Jacob Andreas, Igor Mordatch

Additional experiments explore the role of language-based encodings in these results; we find that it is possible to train a simple adapter layer that maps from observations and action histories to LM embeddings, and thus that language modeling provides an effective initializer even for tasks with no language as input or output.

Imitation Learning Language Modelling

Neural Radiance Flow for 4D View Synthesis and Video Processing

1 code implementation ICCV 2021 Yilun Du, Yinan Zhang, Hong-Xing Yu, Joshua B. Tenenbaum, Jiajun Wu

We present a method, Neural Radiance Flow (NeRFlow), to learn a 4D spatial-temporal representation of a dynamic scene from a set of RGB images.

Image Super-Resolution Temporal View Synthesis

Improved Contrastive Divergence Training of Energy Based Models

4 code implementations2 Dec 2020 Yilun Du, Shuang Li, Joshua Tenenbaum, Igor Mordatch

Contrastive divergence is a popular method of training energy-based models, but is known to have difficulties with training stability.

Data Augmentation Image Generation +1

Compositional Visual Generation with Energy Based Models

no code implementations NeurIPS 2020 Yilun Du, Shuang Li, Igor Mordatch

A vital aspect of human intelligence is the ability to compose increasingly complex concepts out of simpler ideas, enabling both rapid learning and adaptation of knowledge.

Energy-Based Models for Continual Learning

1 code implementation24 Nov 2020 Shuang Li, Yilun Du, Gido M. van de Ven, Igor Mordatch

We motivate Energy-Based Models (EBMs) as a promising model class for continual learning problems.

Continual Learning

A Long Horizon Planning Framework for Manipulating Rigid Pointcloud Objects

no code implementations16 Nov 2020 Anthony Simeonov, Yilun Du, Beomjoon Kim, Francois R. Hogan, Joshua Tenenbaum, Pulkit Agrawal, Alberto Rodriguez

We present a framework for solving long-horizon planning problems involving manipulation of rigid objects that operates directly from a point-cloud observation, i. e. without prior object models.

Graph Attention Motion Planning

Learning Object-Based State Estimators for Household Robots

no code implementations6 Nov 2020 Yilun Du, Tomas Lozano-Perez, Leslie Kaelbling

The robot may be called upon later to retrieve objects and will need a long-term object-based memory in order to know how to find them.

Clustering Representation Learning +1

Learning Online Data Association

no code implementations28 Sep 2020 Yilun Du, Joshua B. Tenenbaum, Tomas Perez, Leslie Pack Kaelbling

When an agent interacts with a complex environment, it receives a stream of percepts in which it may detect entities, such as objects or people.

Representation Learning

Compositional Visual Generation and Inference with Energy Based Models

1 code implementation13 Apr 2020 Yilun Du, Shuang Li, Igor Mordatch

A vital aspect of human intelligence is the ability to compose increasingly complex concepts out of simpler ideas, enabling both rapid learning and adaptation of knowledge.

Neural MMO v1.3: A Massively Multiagent Game Environment for Training and Evaluating Neural Networks

no code implementations31 Jan 2020 Joseph Suarez, Yilun Du, Igor Mordatch, Phillip Isola

We present Neural MMO, a massively multiagent game environment inspired by MMOs and discuss our progress on two more general challenges in multiagent systems engineering for AI research: distributed infrastructure and game IO.

Policy Gradient Methods

Observational Overfitting in Reinforcement Learning

no code implementations ICLR 2020 Xingyou Song, Yiding Jiang, Stephen Tu, Yilun Du, Behnam Neyshabur

A major component of overfitting in model-free reinforcement learning (RL) involves the case where the agent may mistakenly correlate reward with certain spurious features from the observations generated by the Markov Decision Process (MDP).

reinforcement-learning Reinforcement Learning (RL)

Implicit Generation and Modeling with Energy Based Models

1 code implementation NeurIPS 2019 Yilun Du, Igor Mordatch

Energy based models (EBMs) are appealing due to their generality and simplicity in likelihood modeling, but have been traditionally difficult to train.

General Classification Image Generation +2

Model Based Planning with Energy Based Models

no code implementations15 Sep 2019 Yilun Du, Toru Lin, Igor Mordatch

We provide an online algorithm to train EBMs while interacting with the environment, and show that EBMs allow for significantly better online learning than corresponding feed-forward networks.

Reinforcement Learning (RL)

An Empirical Study on Hyperparameters and their Interdependence for RL Generalization

no code implementations2 Jun 2019 Xingyou Song, Yilun Du, Jacob Jackson

Recent results in Reinforcement Learning (RL) have shown that agents with limited training environments are susceptible to a large amount of overfitting across many domains.

reinforcement-learning Reinforcement Learning (RL)

Task-Agnostic Dynamics Priors for Deep Reinforcement Learning

1 code implementation13 May 2019 Yilun Du, Karthik Narasimhan

While model-based deep reinforcement learning (RL) holds great promise for sample efficiency and generalization, learning an accurate dynamics model is often challenging and requires substantial interaction with the environment.

reinforcement-learning Reinforcement Learning (RL)

Neural MMO: A massively multiplayer game environment for intelligent agents

no code implementations ICLR 2019 Joseph Suarez, Yilun Du, Phillip Isola, Igor Mordatch

We demonstrate how this platform can be used to study behavior and learning in large populations of neural agents.

Implicit Generation and Generalization in Energy-Based Models

3 code implementations20 Mar 2019 Yilun Du, Igor Mordatch

Energy based models (EBMs) are appealing due to their generality and simplicity in likelihood modeling, but have been traditionally difficult to train.

General Classification Image Reconstruction +1

Neural MMO: A Massively Multiagent Game Environment for Training and Evaluating Intelligent Agents

1 code implementation2 Mar 2019 Joseph Suarez, Yilun Du, Phillip Isola, Igor Mordatch

The emergence of complex life on Earth is often attributed to the arms race that ensued from a huge number of organisms all competing for finite resources.

Learning to Exploit Stability for 3D Scene Parsing

no code implementations NeurIPS 2018 Yilun Du, Zhijian Liu, Hector Basevi, Ales Leonardis, Bill Freeman, Josh Tenenbaum, Jiajun Wu

We first show that applying physics supervision to an existing scene understanding model increases performance, produces more stable predictions, and allows training to an equivalent performance level with fewer annotated training examples.

Scene Understanding Translation

Learning Physics Priors for Deep Reinforcement Learing

no code implementations27 Sep 2018 Yilun Du, Karthik Narasimhan

While model-based deep reinforcement learning (RL) holds great promise for sample efficiency and generalization, learning an accurate dynamics model is challenging and often requires substantial interactions with the environment.

Reinforcement Learning (RL) Transfer Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.