no code implementations • 20 Dec 2022 • Michael Bowling, John D. Martin, David Abel, Will Dabney
The reward hypothesis posits that, "all of what we mean by goals and purposes can be well thought of as maximization of the expected value of the cumulative sum of a received scalar signal (reward)."
no code implementations • 13 Sep 2022 • Jelena Luketina, Sebastian Flennerhag, Yannick Schroecker, David Abel, Tom Zahavy, Satinder Singh
We support these results with a qualitative analysis of resulting meta-parameter schedules and learned functions of context features.
no code implementations • 1 Mar 2022 • David Abel
Reinforcement learning defines the problem facing agents that learn to make good decisions through action and observation alone.
no code implementations • NeurIPS 2021 • David Abel, Will Dabney, Anna Harutyunyan, Mark K. Ho, Michael L. Littman, Doina Precup, Satinder Singh
We then provide a set of polynomial-time algorithms that construct a Markov reward function that allows an agent to optimize tasks of each of these three types, and correctly determine when no such reward function exists.
no code implementations • 7 Oct 2021 • David Abel, Cameron Allen, Dilip Arumugam, D. Ellis Hershkowitz, Michael L. Littman, Lawson L. S. Wong
We address this question by proposing a simple measure of reinforcement-learning hardness called the bad-policy density.
no code implementations • 14 May 2021 • Mark K. Ho, David Abel, Carlos G. Correa, Michael L. Littman, Jonathan D. Cohen, Thomas L. Griffiths
We propose a computational account of this simplification process and, in a series of pre-registered behavioral experiments, show that it is subject to online cognitive control and that people optimally balance the complexity of a task representation and its utility for planning and acting.
no code implementations • 27 Feb 2021 • Tadashi Kozuno, Yunhao Tang, Mark Rowland, Rémi Munos, Steven Kapturowski, Will Dabney, Michal Valko, David Abel
These results indicate that Peng's Q($\lambda$), which was thought to be unsafe, is a theoretically-sound and practically effective algorithm.
1 code implementation • ICML 2020 • Khimya Khetarpal, Zafarali Ahmed, Gheorghe Comanici, David Abel, Doina Precup
Gibson (1977) coined the term "affordances" to describe the fact that certain states enable an agent to do certain actions, in the context of embodied agents.
no code implementations • 13 Feb 2020 • Mark K. Ho, David Abel, Jonathan D. Cohen, Michael L. Littman, Thomas L. Griffiths
Thus, people should plan their actions, but they should also be smart about how they deploy resources used for planning their actions.
2 code implementations • 8 Feb 2020 • Kavosh Asadi, David Abel, Michael L. Littman
In this work, we answer this question in the affirmative, where we take "simple learning algorithm" to be tabular Q-Learning, the "good representations" to be a learned state abstraction, and "challenging problems" to be continuous control tasks.
1 code implementation • 15 Jan 2020 • Erwan Lecarpentier, David Abel, Kavosh Asadi, Yuu Jinnai, Emmanuel Rachelson, Michael L. Littman
We consider the problem of knowledge transfer when an agent is facing a series of Reinforcement Learning (RL) tasks.
no code implementations • 2 Mar 2019 • Yuu Jinnai, Jee Won Park, David Abel, George Konidaris
One of the main challenges in reinforcement learning is solving tasks with sparse reward.
no code implementations • 3 Dec 2018 • Dilip Arumugam, David Abel, Kavosh Asadi, Nakul Gopalan, Christopher Grimm, Jun Ki Lee, Lucas Lehnert, Michael L. Littman
An agent with an inaccurate model of its environment faces a difficult choice: it can ignore the errors in its model and act in the real world in whatever way it determines is optimal with respect to its model.
Model-based Reinforcement Learning
reinforcement-learning
+1
no code implementations • 16 Oct 2018 • Yuu Jinnai, David Abel, D. Ellis Hershkowitz, Michael Littman, George Konidaris
We formalize the problem of selecting the optimal set of options for planning as that of computing the smallest set of options so that planning converges in less than a given maximum of value-iteration passes.
no code implementations • ICML 2018 • David Abel, Yuu Jinnai, Sophie Yue Guo, George Konidaris, Michael Littman
We consider the problem of how best to use prior experience to bootstrap lifelong learning, where an agent faces a series of task instances drawn from some task distribution.
no code implementations • ICML 2018 • David Abel, Dilip Arumugam, Lucas Lehnert, Michael Littman
We introduce two new classes of abstractions: (1) transitive state abstractions, whose optimal form can be computed efficiently, and (2) PAC state abstractions, which are guaranteed to hold with respect to a distribution of tasks.
no code implementations • ICLR 2018 • Christopher Grimm, Dilip Arumugam, Siddharth Karamcheti, David Abel, Lawson L. S. Wong, Michael L. Littman
Deep neural networks are able to solve tasks across a variety of domains and modalities of data.
no code implementations • 15 Jan 2017 • David Abel, John Salvatier, Andreas Stuhlmüller, Owain Evans
Providing Reinforcement Learning agents with expert advice can dramatically improve various aspects of learning.
1 code implementation • 15 Jan 2017 • David Abel, D. Ellis Hershkowitz, Michael L. Littman
The combinatorial explosion that plagues planning and reinforcement learning (RL) algorithms can be moderated using state abstraction.
1 code implementation • 14 Mar 2016 • David Abel, Alekh Agarwal, Fernando Diaz, Akshay Krishnamurthy, Robert E. Schapire
We address both of these challenges with two complementary techniques: First, we develop a gradient-boosting style, non-parametric function approximator for learning on $Q$-function residuals.