Search Results for author: Roy Fox

Found 24 papers, 8 papers with code

Feasible Adversarial Robust Reinforcement Learning for Underspecified Environments

no code implementations19 Jul 2022 JB Lanier, Stephen Mcaleer, Pierre Baldi, Roy Fox

In this paper, we propose Feasible Adversarial Robust RL (FARR), a method for automatically determining the set of environment parameter values over which to be robust.

reinforcement-learning

Self-Play PSRO: Toward Optimal Populations in Two-Player Zero-Sum Games

no code implementations13 Jul 2022 Stephen Mcaleer, JB Lanier, Kevin Wang, Pierre Baldi, Roy Fox, Tuomas Sandholm

Instead of adding only deterministic best responses to the opponent's least exploitable population mixture, SP-PSRO also learns an approximately optimal stochastic policy and adds it to the population as well.

Learning to Query Internet Text for Informing Reinforcement Learning Agents

1 code implementation25 May 2022 Kolby Nottingham, Alekhya Pyla, Sameer Singh, Roy Fox

We show that our method correctly learns to execute queries to maximize reward in a reinforcement learning setting.

reinforcement-learning

Anytime PSRO for Two-Player Zero-Sum Games

no code implementations19 Jan 2022 Stephen Mcaleer, Kevin Wang, John Lanier, Marc Lanctot, Pierre Baldi, Tuomas Sandholm, Roy Fox

PSRO is based on the tabular double oracle (DO) method, an algorithm that is guaranteed to converge to a Nash equilibrium, but may increase exploitability from one iteration to the next.

Multi-agent Reinforcement Learning reinforcement-learning

Target Entropy Annealing for Discrete Soft Actor-Critic

no code implementations6 Dec 2021 Yaosheng Xu, Dailin Hu, Litian Liang, Stephen Mcaleer, Pieter Abbeel, Roy Fox

Soft Actor-Critic (SAC) is considered the state-of-the-art algorithm in continuous action space settings.

Atari Games

Count-Based Temperature Scheduling for Maximum Entropy Reinforcement Learning

no code implementations28 Nov 2021 Dailin Hu, Pieter Abbeel, Roy Fox

Maximum Entropy Reinforcement Learning (MaxEnt RL) algorithms such as Soft Q-Learning (SQL) and Soft Actor-Critic trade off reward and policy entropy, which has the potential to improve training stability and robustness.

Q-Learning reinforcement-learning

Temporal-Difference Value Estimation via Uncertainty-Guided Soft Updates

no code implementations28 Oct 2021 Litian Liang, Yaosheng Xu, Stephen Mcaleer, Dailin Hu, Alexander Ihler, Pieter Abbeel, Roy Fox

Under the belief that $\beta$ is closely related to the (state dependent) model uncertainty, Entropy Regularized Q-Learning (EQL) further introduces a principled scheduling of $\beta$ by maintaining a collection of the model parameters that characterizes model uncertainty.

Q-Learning

Independent Natural Policy Gradient Always Converges in Markov Potential Games

no code implementations20 Oct 2021 Roy Fox, Stephen Mcaleer, Will Overman, Ioannis Panageas

Recent results have shown that independent policy gradient converges in MPGs but it was not known whether Independent Natural Policy Gradient converges in MPGs as well.

Multi-agent Reinforcement Learning

Modular Framework for Visuomotor Language Grounding

no code implementations5 Sep 2021 Kolby Nottingham, Litian Liang, Daeyun Shin, Charless C. Fowlkes, Roy Fox, Sameer Singh

Natural language instruction following tasks serve as a valuable test-bed for grounded language and robotics research.

Improving Social Welfare While Preserving Autonomy via a Pareto Mediator

no code implementations7 Jun 2021 Stephen Mcaleer, John Lanier, Michael Dennis, Pierre Baldi, Roy Fox

Machine learning algorithms often make decisions on behalf of agents with varied and sometimes conflicting interests.

XDO: A Double Oracle Algorithm for Extensive-Form Games

1 code implementation NeurIPS 2021 Stephen Mcaleer, John Lanier, Kevin Wang, Pierre Baldi, Roy Fox

NXDO is the first deep RL method that can find an approximate Nash equilibrium in high-dimensional continuous-action sequential games.

A* Search Without Expansions: Learning Heuristic Functions with Deep Q-Networks

no code implementations8 Feb 2021 Forest Agostinelli, Alexander Shmakov, Stephen Mcaleer, Roy Fox, Pierre Baldi

Since the computation required to expand a node and compute the heuristic values for all of its generated children grows linearly with the size of the action space, A* search can become impractical for problems with large action spaces.

Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games

2 code implementations NeurIPS 2020 Stephen McAleer, John Lanier, Roy Fox, Pierre Baldi

We also introduce an open-source environment for Barrage Stratego, a variant of Stratego with an approximate game tree complexity of $10^{50}$.

reinforcement-learning

Parametrized Hierarchical Procedures for Neural Programming

no code implementations ICLR 2018 Roy Fox, Richard Shin, Sanjay Krishnan, Ken Goldberg, Dawn Song, Ion Stoica

Neural programs are highly accurate and structured policies that perform algorithmic tasks by controlling the behavior of a computation mechanism.

Imitation Learning

RLlib: Abstractions for Distributed Reinforcement Learning

3 code implementations ICML 2018 Eric Liang, Richard Liaw, Philipp Moritz, Robert Nishihara, Roy Fox, Ken Goldberg, Joseph E. Gonzalez, Michael. I. Jordan, Ion Stoica

Reinforcement learning (RL) algorithms involve the deep nesting of highly irregular computation patterns, each of which typically exhibits opportunities for distributed computation.

reinforcement-learning

Fast and Reliable Autonomous Surgical Debridement with Cable-Driven Robots Using a Two-Phase Calibration Procedure

1 code implementation19 Sep 2017 Daniel Seita, Sanjay Krishnan, Roy Fox, Stephen McKinley, John Canny, Ken Goldberg

In Phase II (fine), the bias from Phase I is applied to move the end-effector toward a small set of specific target points on a printed sheet.

Robotics

DART: Noise Injection for Robust Imitation Learning

1 code implementation27 Mar 2017 Michael Laskey, Jonathan Lee, Roy Fox, Anca Dragan, Ken Goldberg

One approach to Imitation Learning is Behavior Cloning, in which a robot observes a supervisor and infers a control policy.

Imitation Learning

Multi-Level Discovery of Deep Options

no code implementations24 Mar 2017 Roy Fox, Sanjay Krishnan, Ion Stoica, Ken Goldberg

Augmenting an agent's control with useful higher-level behaviors called options can greatly reduce the sample complexity of reinforcement learning, but manually designing options is infeasible in high-dimensional and abstract state spaces.

Information-Theoretic Methods for Planning and Learning in Partially Observable Markov Decision Processes

no code implementations24 Sep 2016 Roy Fox

Bounded agents are limited by intrinsic constraints on their ability to process information that is available in their sensors and memory and choose actions and memory updates.

Principled Option Learning in Markov Decision Processes

no code implementations18 Sep 2016 Roy Fox, Michal Moshkovitz, Naftali Tishby

It is well known that options can make planning more efficient, among their many benefits.

Optimal Selective Attention in Reactive Agents

no code implementations29 Dec 2015 Roy Fox, Naftali Tishby

One attempt to deal with this is to focus on reactive policies, that only base their actions on the most recent observation.

Taming the Noise in Reinforcement Learning via Soft Updates

3 code implementations28 Dec 2015 Roy Fox, Ari Pakman, Naftali Tishby

We propose G-learning, a new off-policy learning algorithm that regularizes the value estimates by penalizing deterministic policies in the beginning of the learning process.

Q-Learning reinforcement-learning

A multi-agent control framework for co-adaptation in brain-computer interfaces

no code implementations NeurIPS 2013 Josh S. Merel, Roy Fox, Tony Jebara, Liam Paninski

In a closed-loop brain-computer interface (BCI), adaptive decoders are used to learn parameters suited to decoding the user's neural response.

Cannot find the paper you are looking for? You can Submit a new open access paper.