Search Results for author: Yilun Du

Found 74 papers, 29 papers with code

Planning with Diffusion for Flexible Behavior Synthesis

2 code implementations • 20 May 2022 • Michael Janner, Yilun Du, Joshua B. Tenenbaum, Sergey Levine

Model-based reinforcement learning methods often use learning only for the purpose of estimating an approximate dynamics model, offloading the rest of the decision-making work to classical trajectory optimizers.

Decision Making Denoising +2

2,505

Paper
Code

Kubric: A scalable dataset generator

1 code implementation • CVPR 2022 • Klaus Greff, Francois Belletti, Lucas Beyer, Carl Doersch, Yilun Du, Daniel Duckworth, David J. Fleet, Dan Gnanapragasam, Florian Golemo, Charles Herrmann, Thomas Kipf, Abhijit Kundu, Dmitry Lagun, Issam Laradji, Hsueh-Ti, Liu, Henning Meyer, Yishu Miao, Derek Nowrouzezahrai, Cengiz Oztireli, Etienne Pot, Noha Radwan, Daniel Rebain, Sara Sabour, Mehdi S. M. Sajjadi, Matan Sela, Vincent Sitzmann, Austin Stone, Deqing Sun, Suhani Vora, Ziyu Wang, Tianhao Wu, Kwang Moo Yi, Fangcheng Zhong, Andrea Tagliasacchi

Data is the driving force of machine learning, with the amount and quality of training data often being more important for the performance of a system than architecture and training details.

Fairness Optical Flow Estimation

2,169

Paper
Code

Neural MMO: A Massively Multiagent Game Environment for Training and Evaluating Intelligent Agents

1 code implementation • 2 Mar 2019 • Joseph Suarez, Yilun Du, Phillip Isola, Igor Mordatch

The emergence of complex life on Earth is often attributed to the arms race that ensued from a huge number of organisms all competing for finite resources.

1,527

Paper
Code

3D-LLM: Injecting the 3D World into Large Language Models

5 code implementations • NeurIPS 2023 • Yining Hong, Haoyu Zhen, Peihao Chen, Shuhong Zheng, Yilun Du, Zhenfang Chen, Chuang Gan

Furthermore, experiments on our held-in datasets for 3D captioning, task composition, and 3D-assisted dialogue show that our model outperforms 2D VLMs.

Ranked #4 on 3D Question Answering (3D-QA) on ScanQA Test w/ objects

3D Question Answering (3D-QA) Dense Captioning +1

747

Paper
Code

Compositional Visual Generation with Composable Diffusion Models

1 code implementation • 3 Jun 2022 • Nan Liu, Shuang Li, Yilun Du, Antonio Torralba, Joshua B. Tenenbaum

Large text-guided diffusion models, such as DALLE-2, are able to generate stunning photorealistic images given natural language descriptions.

Sentence

429

Paper
Code

Pre-Trained Language Models for Interactive Decision-Making

1 code implementation • 3 Feb 2022 • Shuang Li, Xavier Puig, Chris Paxton, Yilun Du, Clinton Wang, Linxi Fan, Tao Chen, De-An Huang, Ekin Akyürek, Anima Anandkumar, Jacob Andreas, Igor Mordatch, Antonio Torralba, Yuke Zhu

Together, these results suggest that language modeling induces representations that are useful for modeling not just language, but also goals and plans; these representations can aid learning and generalization even outside of language processing.

Imitation Learning Language Modelling

384

Paper
Code

Implicit Generation and Generalization in Energy-Based Models

3 code implementations • 20 Mar 2019 • Yilun Du, Igor Mordatch

Energy based models (EBMs) are appealing due to their generality and simplicity in likelihood modeling, but have been traditionally difficult to train.

General Classification Image Reconstruction +1

331

Paper
Code

Implicit Generation and Modeling with Energy Based Models

1 code implementation • NeurIPS 2019 • Yilun Du, Igor Mordatch

Energy based models (EBMs) are appealing due to their generality and simplicity in likelihood modeling, but have been traditionally difficult to train.

Ranked #3 on Image Generation on Stacked MNIST

General Classification Image Generation +2

331

Paper
Code

Neural Radiance Flow for 4D View Synthesis and Video Processing

1 code implementation • ICCV 2021 • Yilun Du, Yinan Zhang, Hong-Xing Yu, Joshua B. Tenenbaum, Jiajun Wu

We present a method, Neural Radiance Flow (NeRFlow), to learn a 4D spatial-temporal representation of a dynamic scene from a set of RGB images.

Image Super-Resolution Temporal View Synthesis

318

Paper
Code

Training Diffusion Models with Reinforcement Learning

2 code implementations • 22 May 2023 • Kevin Black, Michael Janner, Yilun Du, Ilya Kostrikov, Sergey Levine

However, most use cases of diffusion models are not concerned with likelihoods, but instead with downstream objectives such as human-perceived image quality or drug effectiveness.

Decision Making Denoising +2

316

Paper
Code

Improving Factuality and Reasoning in Language Models through Multiagent Debate

1 code implementation • 23 May 2023 • Yilun Du, Shuang Li, Antonio Torralba, Joshua B. Tenenbaum, Igor Mordatch

Our findings indicate that this approach significantly enhances mathematical and strategic reasoning across a number of tasks.

Few-Shot Learning Language Modelling +1

260

Paper
Code

3D Shape Generation and Completion through Point-Voxel Diffusion

1 code implementation • ICCV 2021 • Linqi Zhou, Yilun Du, Jiajun Wu

We propose a novel approach for probabilistic generative modeling of 3D shapes.

3D Shape Generation Denoising

198

Paper
Code

Building Cooperative Embodied Agents Modularly with Large Language Models

1 code implementation • 5 Jul 2023 • Hongxin Zhang, Weihua Du, Jiaming Shan, Qinhong Zhou, Yilun Du, Joshua B. Tenenbaum, Tianmin Shu, Chuang Gan

In this work, we address challenging multi-agent cooperation problems with decentralized control, raw sensory observations, costly communication, and multi-objective tasks instantiated in various embodied environments.

Text Generation

164

Paper
Code

Learning to Render Novel Views from Wide-Baseline Stereo Pairs

1 code implementation • CVPR 2023 • Yilun Du, Cameron Smith, Ayush Tewari, Vincent Sitzmann

We conduct extensive comparisons on held-out test scenes across two real-world datasets, significantly outperforming prior work on novel view synthesis from sparse image observations and achieving multi-view-consistent novel view synthesis.

Novel View Synthesis

124

Paper
Code

Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC

2 code implementations • 22 Feb 2023 • Yilun Du, Conor Durkan, Robin Strudel, Joshua B. Tenenbaum, Sander Dieleman, Rob Fergus, Jascha Sohl-Dickstein, Arnaud Doucet, Will Grathwohl

In this work, we build upon these ideas using the score-based interpretation of diffusion models, and explore alternative ways to condition, modify, and reuse diffusion models for tasks involving compositional generation and guidance.

Text-to-Image Generation

119

Paper
Code

Learning Neural Acoustic Fields

1 code implementation • 4 Apr 2022 • Andrew Luo, Yilun Du, Michael J. Tarr, Joshua B. Tenenbaum, Antonio Torralba, Chuang Gan

By modeling acoustic propagation in a scene as a linear time-invariant system, NAFs learn to continuously map all emitter and listener location pairs to a neural impulse response function that can then be applied to arbitrary sounds.

113

Paper
Code

Energy-based models for atomic-resolution protein conformations

1 code implementation • ICLR 2020 • Yilun Du, Joshua Meier, Jerry Ma, Rob Fergus, Alexander Rives

We propose an energy-based model (EBM) of protein conformations that operates at atomic scale.

Protein Design Protein Structure Prediction

Paper
Code

Improved Contrastive Divergence Training of Energy Based Models

4 code implementations • 2 Dec 2020 • Yilun Du, Shuang Li, Joshua Tenenbaum, Igor Mordatch

Contrastive divergence is a popular method of training energy-based models, but is known to have difficulties with training stability.

Data Augmentation Image Generation +1

Paper
Code

Unsupervised Learning of Compositional Energy Concepts

1 code implementation • NeurIPS 2021 • Yilun Du, Shuang Li, Yash Sharma, Joshua B. Tenenbaum, Igor Mordatch

In this work, we propose COMET, which discovers and represents concepts as separate energy functions, enabling us to represent both global concepts as well as objects under a unified framework.

Disentanglement Unsupervised Image Decomposition

Paper
Code

Compositional Visual Generation and Inference with Energy Based Models

1 code implementation • 13 Apr 2020 • Yilun Du, Shuang Li, Igor Mordatch

A vital aspect of human intelligence is the ability to compose increasingly complex concepts out of simpler ideas, enabling both rapid learning and adaptation of knowledge.

Paper
Code

Learning Iterative Reasoning through Energy Minimization

1 code implementation • 30 Jun 2022 • Yilun Du, Shuang Li, Joshua B. Tenenbaum, Igor Mordatch

Finally, we illustrate that our approach can recursively solve algorithmic problems requiring nested reasoning

Image Classification Object Recognition

Paper
Code

Energy-Based Models for Continual Learning

1 code implementation • 24 Nov 2020 • Shuang Li, Yilun Du, Gido M. van de Ven, Igor Mordatch

We motivate Energy-Based Models (EBMs) as a promising model class for continual learning problems.

Continual Learning

Paper
Code

SE(3)-Equivariant Relational Rearrangement with Neural Descriptor Fields

1 code implementation • 17 Nov 2022 • Anthony Simeonov, Yilun Du, Lin Yen-Chen, Alberto Rodriguez, Leslie Pack Kaelbling, Tomas Lozano-Perez, Pulkit Agrawal

This formalism is implemented in three steps: assigning a consistent local coordinate frame to the task-relevant object parts, determining the location and orientation of this coordinate frame on unseen object instances, and executing an action that brings these frames into the desired alignment.

Object

Paper
Code

Curious Representation Learning for Embodied Intelligence

1 code implementation • ICCV 2021 • Yilun Du, Chuang Gan, Phillip Isola

Instead, it must explore its environment to acquire the data it will learn from.

Representation Learning

Paper
Code

Weakly Supervised Human-Object Interaction Detection in Video via Contrastive Spatiotemporal Regions

1 code implementation • ICCV 2021 • Shuang Li, Yilun Du, Antonio Torralba, Josef Sivic, Bryan Russell

Our task poses unique challenges as a system does not know what types of human-object interactions are present in a video or the actual spatiotemporal location of the human and the object.

Human-Object Interaction Detection Object +2

Paper
Code

HAZARD Challenge: Embodied Decision Making in Dynamically Changing Environments

1 code implementation • 23 Jan 2024 • Qinhong Zhou, Sunli Chen, Yisong Wang, Haozhe Xu, Weihua Du, Hongxin Zhang, Yilun Du, Joshua B. Tenenbaum, Chuang Gan

Recent advances in high-fidelity virtual environments serve as one of the major driving forces for building intelligent embodied agents to perceive, reason and interact with the physical world.

Common Sense Reasoning Decision Making +1

Paper
Code

Task-Agnostic Dynamics Priors for Deep Reinforcement Learning

1 code implementation • 13 May 2019 • Yilun Du, Karthik Narasimhan

While model-based deep reinforcement learning (RL) holds great promise for sample efficiency and generalization, learning an accurate dynamics model is often challenging and requires substantial interaction with the environment.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Compositional Generative Inverse Design

1 code implementation • 24 Jan 2024 • Tailin Wu, Takashi Maruyama, Long Wei, Tao Zhang, Yilun Du, Gianluca Iaccarino, Jure Leskovec

In an N-body interaction task and a challenging 2D multi-airfoil design task, we demonstrate that by composing the learned diffusion model at test time, our method allows us to design initial states and boundary shapes that are more complex than those in the training data.

Paper
Code

Neural Descriptor Fields: SE(3)-Equivariant Object Representations for Manipulation

1 code implementation • 9 Dec 2021 • Anthony Simeonov, Yilun Du, Andrea Tagliasacchi, Joshua B. Tenenbaum, Alberto Rodriguez, Pulkit Agrawal, Vincent Sitzmann

Our performance generalizes across both object instances and 6-DoF object poses, and significantly outperforms a recent baseline that relies on 2D descriptors.

Object

Paper
Code

Learning to Exploit Stability for 3D Scene Parsing

no code implementations • NeurIPS 2018 • Yilun Du, Zhijian Liu, Hector Basevi, Ales Leonardis, Bill Freeman, Josh Tenenbaum, Jiajun Wu

We first show that applying physics supervision to an existing scene understanding model increases performance, produces more stable predictions, and allows training to an equivalent performance level with fewer annotated training examples.

Scene Understanding Translation

Paper
Add Code

Neural MMO: A massively multiplayer game environment for intelligent agents

no code implementations • ICLR 2019 • Joseph Suarez, Yilun Du, Phillip Isola, Igor Mordatch

We demonstrate how this platform can be used to study behavior and learning in large populations of neural agents.

Paper
Add Code

An Empirical Study on Hyperparameters and their Interdependence for RL Generalization

no code implementations • 2 Jun 2019 • Xingyou Song, Yilun Du, Jacob Jackson

Recent results in Reinforcement Learning (RL) have shown that agents with limited training environments are susceptible to a large amount of overfitting across many domains.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Model Based Planning with Energy Based Models

no code implementations • 15 Sep 2019 • Yilun Du, Toru Lin, Igor Mordatch

We provide an online algorithm to train EBMs while interacting with the environment, and show that EBMs allow for significantly better online learning than corresponding feed-forward networks.

Reinforcement Learning (RL)

Paper
Add Code

Observational Overfitting in Reinforcement Learning

no code implementations • ICLR 2020 • Xingyou Song, Yiding Jiang, Stephen Tu, Yilun Du, Behnam Neyshabur

A major component of overfitting in model-free reinforcement learning (RL) involves the case where the agent may mistakenly correlate reward with certain spurious features from the observations generated by the Markov Decision Process (MDP).

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Neural MMO v1.3: A Massively Multiagent Game Environment for Training and Evaluating Neural Networks

no code implementations • 31 Jan 2020 • Joseph Suarez, Yilun Du, Igor Mordatch, Phillip Isola

We present Neural MMO, a massively multiagent game environment inspired by MMOs and discuss our progress on two more general challenges in multiagent systems engineering for AI research: distributed infrastructure and game IO.

Policy Gradient Methods

Paper
Add Code

Unsupervised Discovery of 3D Physical Objects from Video

no code implementations • 24 Jul 2020 • Yilun Du, Kevin Smith, Tomer Ulman, Joshua Tenenbaum, Jiajun Wu

We study the problem of unsupervised physical object discovery.

Object Object Discovery +1

Paper
Add Code

Unsupervised Discovery of 3D Physical Objects

no code implementations • ICLR 2021 • Yilun Du, Kevin A. Smith, Tomer Ullman, Joshua B. Tenenbaum, Jiajun Wu

We study the problem of unsupervised physical object discovery.

Object Object Discovery +1

Paper
Add Code

Learning Object-Based State Estimators for Household Robots

no code implementations • 6 Nov 2020 • Yilun Du, Tomas Lozano-Perez, Leslie Kaelbling

The robot may be called upon later to retrieve objects and will need a long-term object-based memory in order to know how to find them.

Clustering Object +2

Paper
Add Code

Compositional Visual Generation with Energy Based Models

no code implementations • NeurIPS 2020 • Yilun Du, Shuang Li, Igor Mordatch

A vital aspect of human intelligence is the ability to compose increasingly complex concepts out of simpler ideas, enabling both rapid learning and adaptation of knowledge.

Paper
Add Code

A Long Horizon Planning Framework for Manipulating Rigid Pointcloud Objects

no code implementations • 16 Nov 2020 • Anthony Simeonov, Yilun Du, Beomjoon Kim, Francois R. Hogan, Joshua Tenenbaum, Pulkit Agrawal, Alberto Rodriguez

We present a framework for solving long-horizon planning problems involving manipulation of rigid objects that operates directly from a point-cloud observation, i. e. without prior object models.

Graph Attention Motion Planning +2

Paper
Add Code

Language Model Pre-training Improves Generalization in Policy Learning

no code implementations • 29 Sep 2021 • Shuang Li, Xavier Puig, Yilun Du, Ekin Akyürek, Antonio Torralba, Jacob Andreas, Igor Mordatch

Additional experiments explore the role of language-based encodings in these results; we find that it is possible to train a simple adapter layer that maps from observations and action histories to LM embeddings, and thus that language modeling provides an effective initializer even for tasks with no language as input or output.

Imitation Learning Language Modelling

Paper
Add Code

The Neural MMO Platform for Massively Multiagent Research

no code implementations • 14 Oct 2021 • Joseph Suarez, Yilun Du, Clare Zhu, Igor Mordatch, Phillip Isola

Neural MMO is a computationally accessible research platform that combines large agent populations, long time horizons, open-ended tasks, and modular game systems.

Paper
Add Code

Learning Signal-Agnostic Manifolds of Neural Fields

no code implementations • NeurIPS 2021 • Yilun Du, Katherine M. Collins, Joshua B. Tenenbaum, Vincent Sitzmann

We leverage neural fields to capture the underlying structure in image, shape, audio and cross-modal audiovisual domains in a modality-independent manner.

Paper
Add Code

Learning to Compose Visual Relations

no code implementations • NeurIPS 2021 • Nan Liu, Shuang Li, Yilun Du, Joshua B. Tenenbaum, Antonio Torralba

The visual world around us can be described as a structured set of objects and their associated relations.

Paper
Add Code

Learning Physics Priors for Deep Reinforcement Learing

no code implementations • 27 Sep 2018 • Yilun Du, Karthik Narasimhan

While model-based deep reinforcement learning (RL) holds great promise for sample efficiency and generalization, learning an accurate dynamics model is challenging and often requires substantial interactions with the environment.

Reinforcement Learning (RL) Transfer Learning

Paper
Add Code

Learning Good Policies By Learning Good Perceptual Models

no code implementations • 25 Sep 2019 • Yilun Du, Phillip Isola

Reinforcement learning (RL) has led to increasingly complex looking behavior in recent years.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Learning Online Data Association

no code implementations • 28 Sep 2020 • Yilun Du, Joshua B. Tenenbaum, Tomas Perez, Leslie Pack Kaelbling

When an agent interacts with a complex environment, it receives a stream of percepts in which it may detect entities, such as objects or people.

Representation Learning

Paper
Add Code

Streaming Inference for Infinite Non-Stationary Clustering

no code implementations • 2 May 2022 • Rylan Schaeffer, Gabrielle Kaili-May Liu, Yilun Du, Scott Linderman, Ila Rani Fiete

Learning from a continuous stream of non-stationary data in an unsupervised manner is arguably one of the most common and most challenging settings facing intelligent agents.

Clustering Variational Inference

Paper
Add Code

3D Concept Grounding on Neural Fields

no code implementations • 13 Jul 2022 • Yining Hong, Yilun Du, Chunru Lin, Joshua B. Tenenbaum, Chuang Gan

Experimental results show that our proposed framework outperforms unsupervised/language-mediated segmentation models on semantic and instance segmentation tasks, as well as outperforms existing models on the challenging 3D aware visual reasoning tasks.

Instance Segmentation Question Answering +3

Paper
Add Code

Neural Groundplans: Persistent Neural Scene Representations from a Single Image

no code implementations • 22 Jul 2022 • Prafull Sharma, Ayush Tewari, Yilun Du, Sergey Zakharov, Rares Ambrus, Adrien Gaidon, William T. Freeman, Fredo Durand, Joshua B. Tenenbaum, Vincent Sitzmann

We present a method to map 2D image observations of a scene to a persistent 3D scene representation, enabling novel view synthesis and disentangled representation of the movable and immovable components of the scene.

Disentanglement Instance Segmentation +4

Paper
Add Code

Robust Change Detection Based on Neural Descriptor Fields

no code implementations • 1 Aug 2022 • Jiahui Fu, Yilun Du, Kurran Singh, Joshua B. Tenenbaum, John J. Leonard

The ability to reason about changes in the environment is crucial for robots operating over extended periods of time.

Change Detection Object

Paper
Add Code

Composing Ensembles of Pre-trained Models via Iterative Consensus

no code implementations • 20 Oct 2022 • Shuang Li, Yilun Du, Joshua B. Tenenbaum, Antonio Torralba, Igor Mordatch

Such closed-loop communication enables models to correct errors caused by other models, significantly boosting performance on downstream tasks, e. g. improving accuracy on grade school math problems by 7. 5%, without requiring any model finetuning.

Ranked #1 on Video Question Answering on ActivityNet-QA

Arithmetic Reasoning Image Generation +4

Paper
Add Code

Self-conditioned Embedding Diffusion for Text Generation

no code implementations • 8 Nov 2022 • Robin Strudel, Corentin Tallec, Florent Altché, Yilun Du, Yaroslav Ganin, Arthur Mensch, Will Grathwohl, Nikolay Savinov, Sander Dieleman, Laurent SIfre, Rémi Leblond

Can continuous diffusion models bring the same performance breakthrough on natural language they did for image generation?

Image Generation Language Modelling +1

Paper
Add Code

StructDiffusion: Language-Guided Creation of Physically-Valid Structures using Unseen Objects

no code implementations • 8 Nov 2022 • Weiyu Liu, Yilun Du, Tucker Hermans, Sonia Chernova, Chris Paxton

StructDiffusion even improves the success rate of assembling physically-valid structures out of unseen objects by on average 16% over an existing multi-modal transformer model trained on specific structures.

valid

Paper
Add Code

Is Conditional Generative Modeling all you need for Decision-Making?

no code implementations • 28 Nov 2022 • Anurag Ajay, Yilun Du, Abhi Gupta, Joshua Tenenbaum, Tommi Jaakkola, Pulkit Agrawal

We further demonstrate the advantages of modeling policies as conditional diffusion models by considering two other conditioning variables: constraints and skills.

Decision Making Offline RL +1

Paper
Add Code

Local Neural Descriptor Fields: Locally Conditioned Object Representations for Manipulation

no code implementations • 7 Feb 2023 • Ethan Chun, Yilun Du, Anthony Simeonov, Tomas Lozano-Perez, Leslie Kaelbling

A robot operating in a household environment will see a wide range of unique and unfamiliar objects.

Object

Paper
Add Code

Foundation Models for Decision Making: Problems, Methods, and Opportunities

no code implementations • 7 Mar 2023 • Sherry Yang, Ofir Nachum, Yilun Du, Jason Wei, Pieter Abbeel, Dale Schuurmans

In response to these developments, new paradigms are emerging for training foundation models to interact with other agents and perform long-term reasoning.

Autonomous Driving Decision Making +1

Paper
Add Code

NeuSE: Neural SE(3)-Equivariant Embedding for Consistent Spatial Understanding with Objects

no code implementations • 13 Mar 2023 • Jiahui Fu, Yilun Du, Kurran Singh, Joshua B. Tenenbaum, John J. Leonard

We present NeuSE, a novel Neural SE(3)-Equivariant Embedding for objects, and illustrate how it supports object SLAM for consistent spatial understanding with long-term scene changes.

Object Object SLAM

Paper
Add Code

3D Concept Learning and Reasoning from Multi-View Images

no code implementations • CVPR 2023 • Yining Hong, Chunru Lin, Yilun Du, Zhenfang Chen, Joshua B. Tenenbaum, Chuang Gan

We suggest that a principled approach for 3D reasoning from multi-view images should be to infer a compact 3D representation of the world from the multi-view images, which is further grounded on open-vocabulary semantic concepts, and then to execute reasoning on these 3D representations.

Question Answering Visual Question Answering +1

Paper
Add Code

Planning with Sequence Models through Iterative Energy Minimization

no code implementations • 28 Mar 2023 • Hongyi Chen, Yilun Du, Yiye Chen, Joshua Tenenbaum, Patricio A. Vela

In this paper, we suggest an approach towards integrating planning with sequence models based on the idea of iterative energy minimization, and illustrate how such a procedure leads to improved RL performance across different tasks.

Language Modelling Reinforcement Learning (RL)

Paper
Add Code

Probabilistic Adaptation of Text-to-Video Models

no code implementations • 2 Jun 2023 • Mengjiao Yang, Yilun Du, Bo Dai, Dale Schuurmans, Joshua B. Tenenbaum, Pieter Abbeel

Large text-to-video models trained on internet-scale data have demonstrated exceptional capabilities in generating high-fidelity videos from arbitrary textual descriptions.

Language Modelling Large Language Model

Paper
Add Code

Unsupervised Compositional Concepts Discovery with Text-to-Image Generative Models

no code implementations • ICCV 2023 • Nan Liu, Yilun Du, Shuang Li, Joshua B. Tenenbaum, Antonio Torralba

Text-to-image generative models have enabled high-resolution image synthesis across different domains, but require users to specify the content they wish to generate.

Image Generation

Paper
Add Code

Compositional Diffusion-Based Continuous Constraint Solvers

no code implementations • 2 Sep 2023 • Zhutian Yang, Jiayuan Mao, Yilun Du, Jiajun Wu, Joshua B. Tenenbaum, Tomás Lozano-Pérez, Leslie Pack Kaelbling

This paper introduces an approach for learning to solve continuous constraint satisfaction problems (CCSP) in robotic reasoning and planning.

Paper
Add Code

Learning Interactive Real-World Simulators

no code implementations • 9 Oct 2023 • Mengjiao Yang, Yilun Du, Kamyar Ghasemipour, Jonathan Tompson, Leslie Kaelbling, Dale Schuurmans, Pieter Abbeel

Applications of a real-world simulator range from controllable content creation in games and movies, to training embodied agents purely in simulation that can be directly deployed in the real world.

Video Captioning

Paper
Add Code

Learning to Act from Actionless Videos through Dense Correspondences

no code implementations • 12 Oct 2023 • Po-Chen Ko, Jiayuan Mao, Yilun Du, Shao-Hua Sun, Joshua B. Tenenbaum

In this work, we present an approach to construct a video-based robot policy capable of reliably executing diverse tasks across different robots and environments from few video demonstrations without using any action annotations.

Paper
Add Code

Video Language Planning

no code implementations • 16 Oct 2023 • Yilun Du, Mengjiao Yang, Pete Florence, Fei Xia, Ayzaan Wahid, Brian Ichter, Pierre Sermanet, Tianhe Yu, Pieter Abbeel, Joshua B. Tenenbaum, Leslie Kaelbling, Andy Zeng, Jonathan Tompson

We are interested in enabling visual planning for complex long-horizon tasks in the space of generated videos and language, leveraging recent advances in large generative models pretrained on Internet-scale data.

Paper
Add Code

Inferring Relational Potentials in Interacting Systems

no code implementations • 23 Oct 2023 • Armand Comas-Massagué, Yilun Du, Christian Fernandez, Sandesh Ghimire, Mario Sznaier, Joshua B. Tenenbaum, Octavia Camps

In this work, we propose Neural Interaction Inference with Potentials (NIIP) as an alternative approach to discover such interactions that enables greater flexibility in trajectory modeling: it discovers a set of relational potentials, represented as energy functions, which when minimized reconstruct the original trajectory.

Trajectory Forecasting Trajectory Modeling

Paper
Add Code

Large-scale Reinforcement Learning for Diffusion Models

no code implementations • 20 Jan 2024 • Yinan Zhang, Eric Tzeng, Yilun Du, Dmitry Kislyuk

Text-to-image diffusion models are a class of deep generative models that have demonstrated an impressive capacity for high-quality image generation.

Ethics Fairness +3

Paper
Add Code

Compositional Generative Modeling: A Single Model is Not All You Need

no code implementations • 2 Feb 2024 • Yilun Du, Leslie Kaelbling

Large monolithic generative models trained on massive amounts of data have become an increasingly dominant approach in AI research.

Paper
Add Code

PoCo: Policy Composition from and for Heterogeneous Robot Learning

no code implementations • 4 Feb 2024 • Lirui Wang, Jialiang Zhao, Yilun Du, Edward H. Adelson, Russ Tedrake

Training general robotic policies from heterogeneous data for different tasks is a significant challenge.

Paper
Add Code

Bridging Associative Memory and Probabilistic Modeling

no code implementations • 15 Feb 2024 • Rylan Schaeffer, Nika Zahedi, Mikail Khona, Dhruv Pai, Sang Truong, Yilun Du, Mitchell Ostrow, Sarthak Chandra, Andres Carranza, Ila Rani Fiete, Andrey Gromov, Sanmi Koyejo

Based on the observation that associative memory's energy functions can be seen as probabilistic modeling's negative log likelihoods, we build a bridge between the two that enables useful flow of ideas in both directions.

In-Context Learning

Paper
Add Code

Video as the New Language for Real-World Decision Making

no code implementations • 27 Feb 2024 • Sherry Yang, Jacob Walker, Jack Parker-Holder, Yilun Du, Jake Bruce, Andre Barreto, Pieter Abbeel, Dale Schuurmans

Moreover, we demonstrate how, like language models, video generation can serve as planners, agents, compute engines, and environment simulators through techniques such as in-context learning, planning and reinforcement learning.

Decision Making In-Context Learning +2

Paper
Add Code

3D-VLA: A 3D Vision-Language-Action Generative World Model

no code implementations • 14 Mar 2024 • Haoyu Zhen, Xiaowen Qiu, Peihao Chen, Jincheng Yang, Xin Yan, Yilun Du, Yining Hong, Chuang Gan

Recent vision-language-action (VLA) models rely on 2D inputs, lacking integration with the broader realm of the 3D physical world.

Language Modelling Large Language Model +1

Paper
Add Code

COMBO: Compositional World Models for Embodied Multi-Agent Cooperation

no code implementations • 16 Apr 2024 • Hongxin Zhang, Zeyuan Wang, Qiushi Lyu, Zheyuan Zhang, Sunli Chen, Tianmin Shu, Yilun Du, Chuang Gan

In this paper, we investigate the problem of embodied multi-agent cooperation, where decentralized agents must cooperate given only partial egocentric views of the world.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.