Search Results for author: Brandon Cui

Found 7 papers, 3 papers with code

K-level Reasoning for Zero-Shot Coordination in Hanabi

no code implementations NeurIPS 2021 Brandon Cui, Hengyuan Hu, Luis Pineda, Jakob N. Foerster

The standard problem setting in cooperative multi-agent settings is self-play (SP), where the goal is to train a team of agents that works well together.

Self-Explaining Deviations for Coordination

no code implementations13 Jul 2022 Hengyuan Hu, Samuel Sokota, David Wu, Anton Bakhtin, Andrei Lupu, Brandon Cui, Jakob N. Foerster

Fully cooperative, partially observable multi-agent problems are ubiquitous in the real world.

CompilerGym: Robust, Performant Compiler Optimization Environments for AI Research

1 code implementation17 Sep 2021 Chris Cummins, Bram Wasti, Jiadong Guo, Brandon Cui, Jason Ansel, Sahir Gomez, Somya Jain, Jia Liu, Olivier Teytaud, Benoit Steiner, Yuandong Tian, Hugh Leather

What is needed is an easy, reusable experimental infrastructure for real world compiler optimization tasks that can serve as a common benchmark for comparing techniques, and as a platform to accelerate progress in the field.

Compiler Optimization OpenAI Gym

Learning Space Partitions for Path Planning

2 code implementations NeurIPS 2021 Kevin Yang, Tianjun Zhang, Chris Cummins, Brandon Cui, Benoit Steiner, Linnan Wang, Joseph E. Gonzalez, Dan Klein, Yuandong Tian

Path planning, the problem of efficiently discovering high-reward trajectories, often requires optimizing a high-dimensional and multimodal reward function.

Off-Belief Learning

5 code implementations6 Mar 2021 Hengyuan Hu, Adam Lerer, Brandon Cui, David Wu, Luis Pineda, Noam Brown, Jakob Foerster

Policies learned through self-play may adopt arbitrary conventions and implicitly rely on multi-step reasoning based on fragile assumptions about other agents' actions and thus fail when paired with humans or independently trained agents at test time.

Control-Aware Representations for Model-based Reinforcement Learning

no code implementations ICLR 2021 Brandon Cui, Yin-Lam Chow, Mohammad Ghavamzadeh

We first formulate a LCE model to learn representations that are suitable to be used by a policy iteration style algorithm in the latent space.

Model-based Reinforcement Learning reinforcement-learning +2

Variational Model-based Policy Optimization

no code implementations9 Jun 2020 Yin-Lam Chow, Brandon Cui, MoonKyung Ryu, Mohammad Ghavamzadeh

Model-based reinforcement learning (RL) algorithms allow us to combine model-generated data with those collected from interaction with the real system in order to alleviate the data efficiency problem in RL.

Continuous Control Model-based Reinforcement Learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.