Search Results for author: Jack Parker-Holder

Found 50 papers, 20 papers with code

Video as the New Language for Real-World Decision Making

no code implementations27 Feb 2024 Sherry Yang, Jacob Walker, Jack Parker-Holder, Yilun Du, Jake Bruce, Andre Barreto, Pieter Abbeel, Dale Schuurmans

Moreover, we demonstrate how, like language models, video generation can serve as planners, agents, compute engines, and environment simulators through techniques such as in-context learning, planning and reinforcement learning.

Decision Making In-Context Learning +2

Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts

no code implementations26 Feb 2024 Mikayel Samvelyan, Sharath Chandra Raparthy, Andrei Lupu, Eric Hambro, Aram H. Markosyan, Manish Bhatt, Yuning Mao, Minqi Jiang, Jack Parker-Holder, Jakob Foerster, Tim Rocktäschel, Roberta Raileanu

As large language models (LLMs) become increasingly prevalent across many real-world applications, understanding and enhancing their robustness to user inputs is of paramount importance.

Question Answering

Stabilizing Unsupervised Environment Design with a Learned Adversary

1 code implementation21 Aug 2023 Ishita Mediratta, Minqi Jiang, Jack Parker-Holder, Michael Dennis, Eugene Vinitsky, Tim Rocktäschel

As a result, we make it possible for PAIRED to match or exceed state-of-the-art methods, producing robust agents in several established challenging procedurally-generated environments, including a partially-observed maze navigation task and a continuous-control car racing environment.

Car Racing Reinforcement Learning (RL)

Synthetic Experience Replay

1 code implementation NeurIPS 2023 Cong Lu, Philip J. Ball, Yee Whye Teh, Jack Parker-Holder

We believe that synthetic training data could open the door to realizing the full potential of deep learning for replay-based RL algorithms from limited data.

Reinforcement Learning (RL) Self-Supervised Learning

MAESTRO: Open-Ended Environment Design for Multi-Agent Reinforcement Learning

no code implementations6 Mar 2023 Mikayel Samvelyan, Akbir Khan, Michael Dennis, Minqi Jiang, Jack Parker-Holder, Jakob Foerster, Roberta Raileanu, Tim Rocktäschel

Open-ended learning methods that automatically generate a curriculum of increasingly challenging tasks serve as a promising avenue toward generally capable reinforcement learning agents.

Continuous Control Multi-agent Reinforcement Learning +2

Hierarchical Kickstarting for Skill Transfer in Reinforcement Learning

2 code implementations23 Jul 2022 Michael Matthews, Mikayel Samvelyan, Jack Parker-Holder, Edward Grefenstette, Tim Rocktäschel

In this paper, we investigate how skills can be incorporated into the training of reinforcement learning (RL) agents in complex environments with large state-action spaces and sparse rewards.

Inductive Bias NetHack +2

Bayesian Generational Population-Based Training

2 code implementations19 Jul 2022 Xingchen Wan, Cong Lu, Jack Parker-Holder, Philip J. Ball, Vu Nguyen, Binxin Ru, Michael A. Osborne

Leveraging the new highly parallelizable Brax physics engine, we show that these innovations lead to large performance gains, significantly outperforming the tuned baseline while learning entire configurations on the fly.

Bayesian Optimization Reinforcement Learning (RL)

Grounding Aleatoric Uncertainty for Unsupervised Environment Design

1 code implementation11 Jul 2022 Minqi Jiang, Michael Dennis, Jack Parker-Holder, Andrei Lupu, Heinrich Küttler, Edward Grefenstette, Tim Rocktäschel, Jakob Foerster

Problematically, in partially-observable or stochastic settings, optimal policies may depend on the ground-truth distribution over aleatoric parameters of the environment in the intended deployment setting, while curriculum learning necessarily shifts the training distribution.

Reinforcement Learning (RL)

Challenges and Opportunities in Offline Reinforcement Learning from Visual Observations

2 code implementations9 Jun 2022 Cong Lu, Philip J. Ball, Tim G. J. Rudner, Jack Parker-Holder, Michael A. Osborne, Yee Whye Teh

Using this suite of benchmarking tasks, we show that simple modifications to two popular vision-based online reinforcement learning algorithms, DreamerV2 and DrQ-v2, suffice to outperform existing offline RL methods and establish competitive baselines for continuous control in the visual domain.

Benchmarking Continuous Control +3

On-the-fly Strategy Adaptation for ad-hoc Agent Coordination

no code implementations8 Mar 2022 Jaleh Zand, Jack Parker-Holder, Stephen J. Roberts

Training agents in cooperative settings offers the promise of AI agents able to interact effectively with humans (and other agents) in the real world.

Game of Hanabi Multi-agent Reinforcement Learning

Evolving Curricula with Regret-Based Environment Design

3 code implementations2 Mar 2022 Jack Parker-Holder, Minqi Jiang, Michael Dennis, Mikayel Samvelyan, Jakob Foerster, Edward Grefenstette, Tim Rocktäschel

Our approach, which we call Adversarially Compounding Complexity by Editing Levels (ACCEL), seeks to constantly produce levels at the frontier of an agent's capabilities, resulting in curricula that start simple but become increasingly complex.

Reinforcement Learning (RL)

Automated Reinforcement Learning (AutoRL): A Survey and Open Problems

no code implementations11 Jan 2022 Jack Parker-Holder, Raghu Rajan, Xingyou Song, André Biedenkapp, Yingjie Miao, Theresa Eimer, Baohe Zhang, Vu Nguyen, Roberto Calandra, Aleksandra Faust, Frank Hutter, Marius Lindauer

The combination of Reinforcement Learning (RL) with deep learning has led to a series of impressive feats, with many believing (deep) RL provides a path towards generally capable agents.

AutoML Meta-Learning +2

Lyapunov Exponents for Diversity in Differentiable Games

no code implementations24 Dec 2021 Jonathan Lorraine, Paul Vicol, Jack Parker-Holder, Tal Kachman, Luke Metz, Jakob Foerster

We generalize this idea to non-conservative, multi-agent gradient systems by proposing a method - denoted Generalized Ridge Rider (GRR) - for finding arbitrary bifurcation points.

Towards an Understanding of Default Policies in Multitask Policy Optimization

no code implementations4 Nov 2021 Ted Moskovitz, Michael Arbel, Jack Parker-Holder, Aldo Pacchiano

Much of the recent success of deep reinforcement learning has been driven by regularized policy optimization (RPO) algorithms with strong performance across multiple domains.

Revisiting Design Choices in Offline Model-Based Reinforcement Learning

no code implementations8 Oct 2021 Cong Lu, Philip J. Ball, Jack Parker-Holder, Michael A. Osborne, Stephen J. Roberts

Significant progress has been made recently in offline model-based reinforcement learning, approaches which leverage a learned dynamics model.

Bayesian Optimization Model-based Reinforcement Learning +2

Replay-Guided Adversarial Environment Design

4 code implementations NeurIPS 2021 Minqi Jiang, Michael Dennis, Jack Parker-Holder, Jakob Foerster, Edward Grefenstette, Tim Rocktäschel

Furthermore, our theory suggests a highly counterintuitive improvement to PLR: by stopping the agent from updating its policy on uncurated levels (training on less data), we can improve the convergence to Nash equilibria.

Reinforcement Learning (RL)

MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research

1 code implementation27 Sep 2021 Mikayel Samvelyan, Robert Kirk, Vitaly Kurin, Jack Parker-Holder, Minqi Jiang, Eric Hambro, Fabio Petroni, Heinrich Küttler, Edward Grefenstette, Tim Rocktäschel

By leveraging the full set of entities and environment dynamics from NetHack, one of the richest grid-based video games, MiniHack allows designing custom RL testbeds that are fast and convenient to use.

NetHack reinforcement-learning +2

Return Dispersion as an Estimator of Learning Potential for Prioritized Level Replay

no code implementations NeurIPS Workshop ICBINB 2021 Iryna Korshunova, Minqi Jiang, Jack Parker-Holder, Tim Rocktäschel, Edward Grefenstette

Prioritized Level Replay (PLR) has been shown to induce adaptive curricula that improve the sample-efficiency and generalization of reinforcement learning policies in environments featuring multiple tasks or levels.

reinforcement-learning Reinforcement Learning (RL)

From block-Toeplitz matrices to differential equations on graphs: towards a general theory for scalable masked Transformers

1 code implementation16 Jul 2021 Krzysztof Choromanski, Han Lin, Haoxian Chen, Tianyi Zhang, Arijit Sehanobish, Valerii Likhosherstov, Jack Parker-Holder, Tamas Sarlos, Adrian Weller, Thomas Weingarten

In this paper we provide, to the best of our knowledge, the first comprehensive approach for incorporating various masking mechanisms into Transformers architectures in a scalable way.

Graph Attention

Revisiting Design Choices in Offline Model Based Reinforcement Learning

no code implementations NeurIPS 2021 Cong Lu, Philip Ball, Jack Parker-Holder, Michael Osborne, S Roberts

Offline reinforcement learning enables agents to make use of large pre-collected datasets of environment transitions and learn control policies without the need for potentially expensive or unsafe online data collection.

Bayesian Optimization Model-based Reinforcement Learning +3

Augmented World Models Facilitate Zero-Shot Dynamics Generalization From a Single Offline Environment

no code implementations ICLR Workshop SSL-RL 2021 Philip J. Ball, Cong Lu, Jack Parker-Holder, Stephen Roberts

Reinforcement learning from large-scale offline datasets provides us with the ability to learn policies without potentially unsafe or impractical exploration.

Zero-shot Generalization

ES-ENAS: Efficient Evolutionary Optimization for Large Hybrid Search Spaces

2 code implementations19 Jan 2021 Xingyou Song, Krzysztof Choromanski, Jack Parker-Holder, Yunhao Tang, Qiuyi Zhang, Daiyi Peng, Deepali Jain, Wenbo Gao, Aldo Pacchiano, Tamas Sarlos, Yuxiang Yang

In this paper, we approach the problem of optimizing blackbox functions over large hybrid search spaces consisting of both combinatorial and continuous parameters.

Combinatorial Optimization Continuous Control +4

Ridge Rider: Finding Diverse Solutions by Following Eigenvectors of the Hessian

no code implementations NeurIPS 2020 Jack Parker-Holder, Luke Metz, Cinjon Resnick, Hengyuan Hu, Adam Lerer, Alistair Letcher, Alex Peysakhovich, Aldo Pacchiano, Jakob Foerster

In the era of ever decreasing loss functions, SGD and its various offspring have become the go-to optimization tool in machine learning and are a key component of the success of deep neural networks (DNNs).

BIG-bench Machine Learning

Towards Tractable Optimism in Model-Based Reinforcement Learning

no code implementations21 Jun 2020 Aldo Pacchiano, Philip J. Ball, Jack Parker-Holder, Krzysztof Choromanski, Stephen Roberts

The principle of optimism in the face of uncertainty is prevalent throughout sequential decision making problems such as multi-armed bandits and reinforcement learning (RL).

Continuous Control Decision Making +4

Stochastic Flows and Geometric Optimization on the Orthogonal Group

no code implementations ICML 2020 Krzysztof Choromanski, David Cheikhi, Jared Davis, Valerii Likhosherstov, Achille Nazaret, Achraf Bahamou, Xingyou Song, Mrugank Akarte, Jack Parker-Holder, Jacob Bergquist, Yuan Gao, Aldo Pacchiano, Tamas Sarlos, Adrian Weller, Vikas Sindhwani

We present a new class of stochastic, geometrically-driven optimization algorithms on the orthogonal group $O(d)$ and naturally reductive homogeneous manifolds obtained from the action of the rotation group $SO(d)$.

Metric Learning Stochastic Optimization

Ready Policy One: World Building Through Active Learning

no code implementations ICML 2020 Philip Ball, Jack Parker-Holder, Aldo Pacchiano, Krzysztof Choromanski, Stephen Roberts

Model-Based Reinforcement Learning (MBRL) offers a promising direction for sample efficient learning, often achieving state of the art results for continuous control tasks.

Active Learning Continuous Control +1

Provably Efficient Online Hyperparameter Optimization with Population-Based Bandits

2 code implementations NeurIPS 2020 Jack Parker-Holder, Vu Nguyen, Stephen Roberts

A recent solution to this problem is Population Based Training (PBT) which updates both weights and hyperparameters in a single training run of a population of agents.

Hyperparameter Optimization Reinforcement Learning (RL)

Behavior-Guided Reinforcement Learning

no code implementations25 Sep 2019 Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang, Anna Choromanska, Krzysztof Choromanski, Michael I. Jordan

We introduce a new approach for comparing reinforcement learning policies, using Wasserstein distances (WDs) in a newly defined latent behavioral space.

reinforcement-learning Reinforcement Learning (RL)

Reinforcement Learning with Chromatic Networks

no code implementations25 Sep 2019 Xingyou Song, Krzysztof Choromanski, Jack Parker-Holder, Yunhao Tang, Wenbo Gao, Aldo Pacchiano, Tamas Sarlos, Deepali Jain, Yuxiang Yang

We present a neural architecture search algorithm to construct compact reinforcement learning (RL) policies, by combining ENAS and ES in a highly scalable and intuitive way.

Neural Architecture Search reinforcement-learning +1

Reinforcement Learning with Chromatic Networks for Compact Architecture Search

no code implementations10 Jul 2019 Xingyou Song, Krzysztof Choromanski, Jack Parker-Holder, Yunhao Tang, Wenbo Gao, Aldo Pacchiano, Tamas Sarlos, Deepali Jain, Yuxiang Yang

We present a neural architecture search algorithm to construct compact reinforcement learning (RL) policies, by combining ENAS and ES in a highly scalable and intuitive way.

Combinatorial Optimization Neural Architecture Search +2

Learning to Score Behaviors for Guided Policy Optimization

1 code implementation ICML 2020 Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang, Anna Choromanska, Krzysztof Choromanski, Michael. I. Jordan

We introduce a new approach for comparing reinforcement learning policies, using Wasserstein distances (WDs) in a newly defined latent behavioral space.

Efficient Exploration Imitation Learning +2

Structured Monte Carlo Sampling for Nonisotropic Distributions via Determinantal Point Processes

no code implementations29 May 2019 Krzysztof Choromanski, Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang

We propose a new class of structured methods for Monte Carlo (MC) sampling, called DPPMC, designed for high-dimensional nonisotropic distributions where samples are correlated to reduce the variance of the estimator via determinantal point processes.

Point Processes

Provably Robust Blackbox Optimization for Reinforcement Learning

no code implementations7 Mar 2019 Krzysztof Choromanski, Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang, Deepali Jain, Yuxiang Yang, Atil Iscen, Jasmine Hsu, Vikas Sindhwani

Interest in derivative-free optimization (DFO) and "evolutionary strategies" (ES) has recently surged in the Reinforcement Learning (RL) community, with growing evidence that they can match state of the art methods for policy optimization problems in Robotics.

reinforcement-learning Reinforcement Learning (RL) +1

From Complexity to Simplicity: Adaptive ES-Active Subspaces for Blackbox Optimization

1 code implementation NeurIPS 2019 Krzysztof Choromanski, Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang

ASEBO adapts to the geometry of the function and learns optimal sets of sensing directions, which are used to probe it, on-the-fly.

Multi-Armed Bandits

Cannot find the paper you are looking for? You can Submit a new open access paper.