no code implementations • 13 Oct 2024 • KyungMin Kim, JB Lanier, Pierre Baldi, Charless Fowlkes, Roy Fox
Training in the presence of visual distractions is particularly difficult due to the high variation they introduce to representation learning.
no code implementations • 2 Oct 2024 • KyungMin Kim, Davide Corsi, Andoni Rodriguez, JB Lanier, Benjami Parellada, Pierre Baldi, Cesar Sanchez, Roy Fox
For real-world robotic domains, it is essential to define safety specifications over continuous state and action spaces to accurately account for system dynamics and compute new actions that minimally deviate from the agent's original decision.
no code implementations • 10 Jun 2024 • Davide Corsi, Guy Amir, Andoni Rodriguez, Cesar Sanchez, Guy Katz, Roy Fox
Our approach combines both formal and probabilistic verification tools to partition the input domain into safe and unsafe regions.
1 code implementation • 18 Mar 2024 • Armin Karamzade, KyungMin Kim, Montek Kalsi, Roy Fox
In standard reinforcement learning settings, agents typically assume immediate feedback about the effects of their actions after taking them.
no code implementations • 22 Feb 2024 • Dmitrii Krylov, Armin Karamzade, Roy Fox
Our method, Moonwalk, has a time complexity linear in the depth of the network, unlike the quadratic time complexity of na\"ive forward, and empirically reduces computation time by several orders of magnitude without allocating more memory.
1 code implementation • 5 Feb 2024 • Kolby Nottingham, Bodhisattwa Prasad Majumder, Bhavana Dalvi Mishra, Sameer Singh, Peter Clark, Roy Fox
We evaluate our method in the classic videogame NetHack and the text environment ScienceWorld to demonstrate SSO's ability to optimize a set of skills and perform in-context policy improvement.
1 code implementation • 25 Jul 2023 • Dmitrii Krylov, Pooya Khajeh, Junhan Ouyang, Thomas Reeves, Tongkai Liu, Hiba Ajmal, Hamidreza Aghasi, Roy Fox
In this work, we propose a method for generating from simulation data a dataset on which a system can be trained via supervised learning to design circuits to meet threshold specifications.
no code implementations • 21 Jul 2023 • Kolby Nottingham, Yasaman Razeghi, KyungMin Kim, JB Lanier, Pierre Baldi, Roy Fox, Sameer Singh
Large language models (LLMs) are being applied as actors for sequential decision making tasks in domains such as robotics and games, utilizing their general world knowledge and planning abilities.
no code implementations • 28 Jan 2023 • Kolby Nottingham, Prithviraj Ammanabrolu, Alane Suhr, Yejin Choi, Hannaneh Hajishirzi, Sameer Singh, Roy Fox
Reinforcement learning (RL) agents typically learn tabula rasa, without prior knowledge of the world.
1 code implementation • 16 Sep 2022 • Litian Liang, Yaosheng Xu, Stephen Mcaleer, Dailin Hu, Alexander Ihler, Pieter Abbeel, Roy Fox
On a set of 26 benchmark Atari environments, MeanQ outperforms all tested baselines, including the best available baseline, SUNRISE, at 100K interaction steps in 16/26 environments, and by 68% on average.
no code implementations • 19 Jul 2022 • JB Lanier, Stephen Mcaleer, Pierre Baldi, Roy Fox
In this paper, we propose Feasible Adversarial Robust RL (FARR), a novel problem formulation and objective for automatically determining the set of environment parameter values over which to be robust.
no code implementations • 13 Jul 2022 • Stephen Mcaleer, JB Lanier, Kevin Wang, Pierre Baldi, Roy Fox, Tuomas Sandholm
Instead of adding only deterministic best responses to the opponent's least exploitable population mixture, SP-PSRO also learns an approximately optimal stochastic policy and adds it to the population as well.
1 code implementation • 25 May 2022 • Kolby Nottingham, Alekhya Pyla, Sameer Singh, Roy Fox
We show that our method correctly learns to execute queries to maximize reward in a reinforcement learning setting.
no code implementations • 19 Jan 2022 • Stephen Mcaleer, Kevin Wang, John Lanier, Marc Lanctot, Pierre Baldi, Tuomas Sandholm, Roy Fox
PSRO is based on the tabular double oracle (DO) method, an algorithm that is guaranteed to converge to a Nash equilibrium, but may increase exploitability from one iteration to the next.
Multi-agent Reinforcement Learning
reinforcement-learning
+3
no code implementations • 6 Dec 2021 • Yaosheng Xu, Dailin Hu, Litian Liang, Stephen Mcaleer, Pieter Abbeel, Roy Fox
Soft Actor-Critic (SAC) is considered the state-of-the-art algorithm in continuous action space settings.
no code implementations • 28 Nov 2021 • Dailin Hu, Pieter Abbeel, Roy Fox
Maximum Entropy Reinforcement Learning (MaxEnt RL) algorithms such as Soft Q-Learning (SQL) and Soft Actor-Critic trade off reward and policy entropy, which has the potential to improve training stability and robustness.
no code implementations • 28 Oct 2021 • Litian Liang, Yaosheng Xu, Stephen Mcaleer, Dailin Hu, Alexander Ihler, Pieter Abbeel, Roy Fox
Under the belief that $\beta$ is closely related to the (state dependent) model uncertainty, Entropy Regularized Q-Learning (EQL) further introduces a principled scheduling of $\beta$ by maintaining a collection of the model parameters that characterizes model uncertainty.
no code implementations • 20 Oct 2021 • Roy Fox, Stephen Mcaleer, Will Overman, Ioannis Panageas
Recent results have shown that independent policy gradient converges in MPGs but it was not known whether Independent Natural Policy Gradient converges in MPGs as well.
no code implementations • 5 Sep 2021 • Kolby Nottingham, Litian Liang, Daeyun Shin, Charless C. Fowlkes, Roy Fox, Sameer Singh
Natural language instruction following tasks serve as a valuable test-bed for grounded language and robotics research.
no code implementations • 7 Jun 2021 • Stephen Mcaleer, John Lanier, Michael Dennis, Pierre Baldi, Roy Fox
Machine learning algorithms often make decisions on behalf of agents with varied and sometimes conflicting interests.
1 code implementation • NeurIPS 2021 • Stephen Mcaleer, John Lanier, Kevin Wang, Pierre Baldi, Roy Fox
NXDO is the first deep RL method that can find an approximate Nash equilibrium in high-dimensional continuous-action sequential games.
no code implementations • 8 Feb 2021 • Forest Agostinelli, Alexander Shmakov, Stephen Mcaleer, Roy Fox, Pierre Baldi
We use Q* search to solve the Rubik's cube when formulated with a large action space that includes 1872 meta-actions and find that this 157-fold increase in the size of the action space incurs less than a 4-fold increase in computation time and less than a 3-fold increase in number of nodes generated when performing Q* search.
2 code implementations • NeurIPS 2020 • Stephen McAleer, John Lanier, Roy Fox, Pierre Baldi
We also introduce an open-source environment for Barrage Stratego, a variant of Stratego with an approximate game tree complexity of $10^{50}$.
1 code implementation • 29 Dec 2019 • Roy Fox, Richard Shin, William Paul, Yitian Zou, Dawn Song, Ken Goldberg, Pieter Abbeel, Ion Stoica
Autonomous agents can learn by imitating teacher demonstrations of the intended behavior.
no code implementations • ICLR 2018 • Roy Fox, Richard Shin, Sanjay Krishnan, Ken Goldberg, Dawn Song, Ion Stoica
Neural programs are highly accurate and structured policies that perform algorithmic tasks by controlling the behavior of a computation mechanism.
3 code implementations • ICML 2018 • Eric Liang, Richard Liaw, Philipp Moritz, Robert Nishihara, Roy Fox, Ken Goldberg, Joseph E. Gonzalez, Michael. I. Jordan, Ion Stoica
Reinforcement learning (RL) algorithms involve the deep nesting of highly irregular computation patterns, each of which typically exhibits opportunities for distributed computation.
1 code implementation • 19 Sep 2017 • Daniel Seita, Sanjay Krishnan, Roy Fox, Stephen McKinley, John Canny, Ken Goldberg
In Phase II (fine), the bias from Phase I is applied to move the end-effector toward a small set of specific target points on a printed sheet.
Robotics
2 code implementations • 27 Mar 2017 • Michael Laskey, Jonathan Lee, Roy Fox, Anca Dragan, Ken Goldberg
One approach to Imitation Learning is Behavior Cloning, in which a robot observes a supervisor and infers a control policy.
no code implementations • 24 Mar 2017 • Roy Fox, Sanjay Krishnan, Ion Stoica, Ken Goldberg
Augmenting an agent's control with useful higher-level behaviors called options can greatly reduce the sample complexity of reinforcement learning, but manually designing options is infeasible in high-dimensional and abstract state spaces.
no code implementations • 24 Sep 2016 • Roy Fox
Bounded agents are limited by intrinsic constraints on their ability to process information that is available in their sensors and memory and choose actions and memory updates.
no code implementations • 18 Sep 2016 • Roy Fox, Michal Moshkovitz, Naftali Tishby
It is well known that options can make planning more efficient, among their many benefits.
no code implementations • 29 Dec 2015 • Roy Fox, Naftali Tishby
One attempt to deal with this is to focus on reactive policies, that only base their actions on the most recent observation.
3 code implementations • 28 Dec 2015 • Roy Fox, Ari Pakman, Naftali Tishby
We propose G-learning, a new off-policy learning algorithm that regularizes the value estimates by penalizing deterministic policies in the beginning of the learning process.
no code implementations • NeurIPS 2013 • Josh S. Merel, Roy Fox, Tony Jebara, Liam Paninski
In a closed-loop brain-computer interface (BCI), adaptive decoders are used to learn parameters suited to decoding the user's neural response.