Search Results for author: Roy Fox

Found 31 papers, 10 papers with code

A multi-agent control framework for co-adaptation in brain-computer interfaces

no code implementations NeurIPS 2013 Josh S. Merel, Roy Fox, Tony Jebara, Liam Paninski

In a closed-loop brain-computer interface (BCI), adaptive decoders are used to learn parameters suited to decoding the user's neural response.

Brain Computer Interface

Taming the Noise in Reinforcement Learning via Soft Updates

3 code implementations28 Dec 2015 Roy Fox, Ari Pakman, Naftali Tishby

We propose G-learning, a new off-policy learning algorithm that regularizes the value estimates by penalizing deterministic policies in the beginning of the learning process.

Q-Learning reinforcement-learning +1

Optimal Selective Attention in Reactive Agents

no code implementations29 Dec 2015 Roy Fox, Naftali Tishby

One attempt to deal with this is to focus on reactive policies, that only base their actions on the most recent observation.

Principled Option Learning in Markov Decision Processes

no code implementations18 Sep 2016 Roy Fox, Michal Moshkovitz, Naftali Tishby

It is well known that options can make planning more efficient, among their many benefits.

Information-Theoretic Methods for Planning and Learning in Partially Observable Markov Decision Processes

no code implementations24 Sep 2016 Roy Fox

Bounded agents are limited by intrinsic constraints on their ability to process information that is available in their sensors and memory and choose actions and memory updates.

Multi-Level Discovery of Deep Options

no code implementations24 Mar 2017 Roy Fox, Sanjay Krishnan, Ion Stoica, Ken Goldberg

Augmenting an agent's control with useful higher-level behaviors called options can greatly reduce the sample complexity of reinforcement learning, but manually designing options is infeasible in high-dimensional and abstract state spaces.

DART: Noise Injection for Robust Imitation Learning

2 code implementations27 Mar 2017 Michael Laskey, Jonathan Lee, Roy Fox, Anca Dragan, Ken Goldberg

One approach to Imitation Learning is Behavior Cloning, in which a robot observes a supervisor and infers a control policy.

Imitation Learning

Fast and Reliable Autonomous Surgical Debridement with Cable-Driven Robots Using a Two-Phase Calibration Procedure

1 code implementation19 Sep 2017 Daniel Seita, Sanjay Krishnan, Roy Fox, Stephen McKinley, John Canny, Ken Goldberg

In Phase II (fine), the bias from Phase I is applied to move the end-effector toward a small set of specific target points on a printed sheet.

Robotics

RLlib: Abstractions for Distributed Reinforcement Learning

3 code implementations ICML 2018 Eric Liang, Richard Liaw, Philipp Moritz, Robert Nishihara, Roy Fox, Ken Goldberg, Joseph E. Gonzalez, Michael. I. Jordan, Ion Stoica

Reinforcement learning (RL) algorithms involve the deep nesting of highly irregular computation patterns, each of which typically exhibits opportunities for distributed computation.

reinforcement-learning Reinforcement Learning (RL)

Parametrized Hierarchical Procedures for Neural Programming

no code implementations ICLR 2018 Roy Fox, Richard Shin, Sanjay Krishnan, Ken Goldberg, Dawn Song, Ion Stoica

Neural programs are highly accurate and structured policies that perform algorithmic tasks by controlling the behavior of a computation mechanism.

Imitation Learning

Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games

2 code implementations NeurIPS 2020 Stephen McAleer, John Lanier, Roy Fox, Pierre Baldi

We also introduce an open-source environment for Barrage Stratego, a variant of Stratego with an approximate game tree complexity of $10^{50}$.

reinforcement-learning Reinforcement Learning (RL)

A* Search Without Expansions: Learning Heuristic Functions with Deep Q-Networks

no code implementations8 Feb 2021 Forest Agostinelli, Alexander Shmakov, Stephen Mcaleer, Roy Fox, Pierre Baldi

We use Q* search to solve the Rubik's cube when formulated with a large action space that includes 1872 meta-actions and find that this 157-fold increase in the size of the action space incurs less than a 4-fold increase in computation time and less than a 3-fold increase in number of nodes generated when performing Q* search.

Rubik's Cube

XDO: A Double Oracle Algorithm for Extensive-Form Games

1 code implementation NeurIPS 2021 Stephen Mcaleer, John Lanier, Kevin Wang, Pierre Baldi, Roy Fox

NXDO is the first deep RL method that can find an approximate Nash equilibrium in high-dimensional continuous-action sequential games.

Reinforcement Learning (RL)

Improving Social Welfare While Preserving Autonomy via a Pareto Mediator

no code implementations7 Jun 2021 Stephen Mcaleer, John Lanier, Michael Dennis, Pierre Baldi, Roy Fox

Machine learning algorithms often make decisions on behalf of agents with varied and sometimes conflicting interests.

Open-Ended Question Answering

Modular Framework for Visuomotor Language Grounding

no code implementations5 Sep 2021 Kolby Nottingham, Litian Liang, Daeyun Shin, Charless C. Fowlkes, Roy Fox, Sameer Singh

Natural language instruction following tasks serve as a valuable test-bed for grounded language and robotics research.

Instruction Following

Independent Natural Policy Gradient Always Converges in Markov Potential Games

no code implementations20 Oct 2021 Roy Fox, Stephen Mcaleer, Will Overman, Ioannis Panageas

Recent results have shown that independent policy gradient converges in MPGs but it was not known whether Independent Natural Policy Gradient converges in MPGs as well.

Multi-agent Reinforcement Learning

Temporal-Difference Value Estimation via Uncertainty-Guided Soft Updates

no code implementations28 Oct 2021 Litian Liang, Yaosheng Xu, Stephen Mcaleer, Dailin Hu, Alexander Ihler, Pieter Abbeel, Roy Fox

Under the belief that $\beta$ is closely related to the (state dependent) model uncertainty, Entropy Regularized Q-Learning (EQL) further introduces a principled scheduling of $\beta$ by maintaining a collection of the model parameters that characterizes model uncertainty.

Q-Learning Scheduling

Count-Based Temperature Scheduling for Maximum Entropy Reinforcement Learning

no code implementations28 Nov 2021 Dailin Hu, Pieter Abbeel, Roy Fox

Maximum Entropy Reinforcement Learning (MaxEnt RL) algorithms such as Soft Q-Learning (SQL) and Soft Actor-Critic trade off reward and policy entropy, which has the potential to improve training stability and robustness.

Q-Learning reinforcement-learning +2

Target Entropy Annealing for Discrete Soft Actor-Critic

no code implementations6 Dec 2021 Yaosheng Xu, Dailin Hu, Litian Liang, Stephen Mcaleer, Pieter Abbeel, Roy Fox

Soft Actor-Critic (SAC) is considered the state-of-the-art algorithm in continuous action space settings.

Atari Games Scheduling

Anytime PSRO for Two-Player Zero-Sum Games

no code implementations19 Jan 2022 Stephen Mcaleer, Kevin Wang, John Lanier, Marc Lanctot, Pierre Baldi, Tuomas Sandholm, Roy Fox

PSRO is based on the tabular double oracle (DO) method, an algorithm that is guaranteed to converge to a Nash equilibrium, but may increase exploitability from one iteration to the next.

Multi-agent Reinforcement Learning reinforcement-learning +2

Learning to Query Internet Text for Informing Reinforcement Learning Agents

1 code implementation25 May 2022 Kolby Nottingham, Alekhya Pyla, Sameer Singh, Roy Fox

We show that our method correctly learns to execute queries to maximize reward in a reinforcement learning setting.

reinforcement-learning Reinforcement Learning (RL)

Self-Play PSRO: Toward Optimal Populations in Two-Player Zero-Sum Games

no code implementations13 Jul 2022 Stephen Mcaleer, JB Lanier, Kevin Wang, Pierre Baldi, Roy Fox, Tuomas Sandholm

Instead of adding only deterministic best responses to the opponent's least exploitable population mixture, SP-PSRO also learns an approximately optimal stochastic policy and adds it to the population as well.

Reinforcement Learning (RL)

Feasible Adversarial Robust Reinforcement Learning for Underspecified Environments

no code implementations19 Jul 2022 JB Lanier, Stephen Mcaleer, Pierre Baldi, Roy Fox

In this paper, we propose Feasible Adversarial Robust RL (FARR), a novel problem formulation and objective for automatically determining the set of environment parameter values over which to be robust.

reinforcement-learning Reinforcement Learning (RL)

Reducing Variance in Temporal-Difference Value Estimation via Ensemble of Deep Networks

1 code implementation16 Sep 2022 Litian Liang, Yaosheng Xu, Stephen Mcaleer, Dailin Hu, Alexander Ihler, Pieter Abbeel, Roy Fox

On a set of 26 benchmark Atari environments, MeanQ outperforms all tested baselines, including the best available baseline, SUNRISE, at 100K interaction steps in 16/26 environments, and by 68% on average.

Selective Perception: Optimizing State Descriptions with Reinforcement Learning for Language Model Actors

no code implementations21 Jul 2023 Kolby Nottingham, Yasaman Razeghi, KyungMin Kim, JB Lanier, Pierre Baldi, Roy Fox, Sameer Singh

Large language models (LLMs) are being applied as actors for sequential decision making tasks in domains such as robotics and games, utilizing their general world knowledge and planning abilities.

Decision Making Language Modelling +2

Learning to Design Analog Circuits to Meet Threshold Specifications

1 code implementation25 Jul 2023 Dmitrii Krylov, Pooya Khajeh, Junhan Ouyang, Thomas Reeves, Tongkai Liu, Hiba Ajmal, Hamidreza Aghasi, Roy Fox

In this work, we propose a method for generating from simulation data a dataset on which a system can be trained via supervised learning to design circuits to meet threshold specifications.

Skill Set Optimization: Reinforcing Language Model Behavior via Transferable Skills

no code implementations5 Feb 2024 Kolby Nottingham, Bodhisattwa Prasad Majumder, Bhavana Dalvi Mishra, Sameer Singh, Peter Clark, Roy Fox

We evaluate our method in the classic videogame NetHack and the text environment ScienceWorld to demonstrate SSO's ability to optimize a set of skills and perform in-context policy improvement.

Decision Making Language Modelling +1

Moonwalk: Inverse-Forward Differentiation

no code implementations22 Feb 2024 Dmitrii Krylov, Armin Karamzade, Roy Fox

Our method, Moonwalk, has a time complexity linear in the depth of the network, unlike the quadratic time complexity of na\"ive forward, and empirically reduces computation time by several orders of magnitude without allocating more memory.

Reinforcement Learning from Delayed Observations via World Models

no code implementations18 Mar 2024 Armin Karamzade, KyungMin Kim, Montek Kalsi, Roy Fox

In standard Reinforcement Learning settings, agents typically assume immediate feedback about the effects of their actions after taking them.

reinforcement-learning

Cannot find the paper you are looking for? You can Submit a new open access paper.