1 code implementation • 10 Dec 2024 • Jacob Adkins, Michael Bowling, Adam White
The performance of modern reinforcement learning algorithms critically relies on tuning ever-increasing numbers of hyperparameters.
1 code implementation • 2 Sep 2024 • Esraa Elelimy, Adam White, Michael Bowling, Martha White
Recurrent Neural Networks (RNNs) are used to learn representations in partially observable environments.
no code implementations • 26 Jul 2024 • Andrew Patterson, Samuel Neumann, Raksha Kumaraswamy, Martha White, Adam White
This paper introduces a new empirical methodology, the Cross-environment Hyperparameter Setting Benchmark, that compares RL algorithms across environments using a single hyperparameter setting, encouraging algorithmic development which is insensitive to hyperparameters.
no code implementations • 12 Jul 2024 • Parham Mohammad Panahi, Andrew Patterson, Martha White, Adam White
One exception is Prioritized Experience Replay (PER), where sampling is done proportionally to TD errors, inspired by the success of prioritized sweeping in dynamic programming.
no code implementations • 23 Jun 2024 • Scott M. Jordan, Adam White, Bruno Castro da Silva, Martha White, Philip S. Thomas
Novel reinforcement learning algorithms, or improvements on existing ones, are commonly justified by evaluating their performance on benchmark environments and are compared to an ever-changing set of standard algorithms.
no code implementations • 3 Jun 2024 • Kevin Roice, Parham Mohammad Panahi, Scott M. Jordan, Adam White, Martha White
In this paper, we avoid this limitation by constraining background planning to a set of (abstract) subgoals and learning only local, subgoal-conditioned models.
Model-based Reinforcement Learning
reinforcement-learning
+1
no code implementations • 2 Apr 2024 • Golnaz Mesbahi, Parham Mohammad Panahi, Olya Mastikhina, Martha White, Adam White
In continual or lifelong reinforcement learning, access to the environment should be limited.
no code implementations • 26 Mar 2024 • David Rolnick, Alan Aspuru-Guzik, Sara Beery, Bistra Dilkina, Priya L. Donti, Marzyeh Ghassemi, Hannah Kerner, Claire Monteleoni, Esther Rolf, Milind Tambe, Adam White
As applications of machine learning proliferate, innovative algorithms inspired by specific real-world challenges have become increasingly important.
no code implementations • 4 Dec 2023 • Muhammad Kamran Janjua, Haseeb Shah, Martha White, Erfan Miahi, Marlos C. Machado, Adam White
In this paper we investigate the use of reinforcement-learning based prediction approaches for a real drinking-water treatment plant.
1 code implementation • 2 Dec 2023 • Edan Meyer, Adam White, Marlos C. Machado
In this work, we provide a thorough empirical investigation of the advantages of representing observations as vectors of categorical values within the context of reinforcement learning.
no code implementations • 29 Oct 2023 • Adam White, Margarita Saranti, Artur d'Avila Garcez, Thomas M. H. Hope, Cathy J. Price, Howard Bowman
The highest classification accuracy 0. 854 was observed when 8 regions-of-interest was extracted from each MRI scan and combined with lesion size, initial severity and recovery time in a 2D Residual Neural Network. Our findings demonstrate how imaging and tabular data can be combined for high post-stroke classification accuracy, even when the dataset is small in machine learning terms.
2 code implementations • 24 Oct 2023 • Subhojeet Pramanik, Esraa Elelimy, Marlos C. Machado, Adam White
In this paper, we introduce recurrent alternatives to the transformer self-attention mechanism that offer context-independent inference cost, leverage long-range dependencies effectively, and performs well in online reinforcement learning task.
no code implementations • 10 Jul 2023 • Vincent Liu, Han Wang, Ruo Yu Tao, Khurram Javed, Adam White, Martha White
Lastly, we outline a class of algorithms which we call online-aware that are designed to mitigate interference, and show they do reduce interference according to our measure and that they improve stability and performance in several classic control environments.
no code implementations • 3 Apr 2023 • Andrew Patterson, Samuel Neumann, Martha White, Adam White
The objective of this document is to provide answers on how we can use our unprecedented compute to do good science in reinforcement learning, as well as stay alert to potential pitfalls in our empirical design.
no code implementations • 13 Mar 2023 • Zaheer Abbas, Rosie Zhao, Joseph Modayil, Adam White, Marlos C. Machado
The ability to learn continually is essential in a complex and changing world.
4 code implementations • 28 Feb 2023 • Chenjun Xiao, Han Wang, Yangchen Pan, Adam White, Martha White
We highlight a simple fact: it is more straightforward to approximate an in-sample \emph{softmax} using only actions in the dataset.
1 code implementation • 15 Nov 2022 • Ruo Yu Tao, Adam White, Marlos C. Machado
Finally, we show that this approach is complementary to state-of-the-art methods such as recurrent neural networks and truncated back-propagation through time, and acts as a heuristic that facilitates longer temporal credit assignment, leading to better performance.
no code implementations • 25 Oct 2022 • Banafsheh Rafiee, Sina Ghiassian, Jun Jin, Richard Sutton, Jun Luo, Adam White
In this paper, we explore an approach to auxiliary task discovery in reinforcement learning based on ideas from representation learning.
no code implementations • 6 Jun 2022 • Chunlok Lo, Kevin Roice, Parham Mohammad Panahi, Scott Jordan, Adam White, Gabor Mihucz, Farzane Aminmansour, Martha White
In this paper, we avoid this limitation by constraining background planning to a set of (abstract) subgoals and learning only local, subgoal-conditioned models.
Model-based Reinforcement Learning
Reinforcement Learning (RL)
no code implementations • 18 May 2022 • Han Wang, Archit Sakhadeo, Adam White, James Bell, Vincent Liu, Xutong Zhao, Puer Liu, Tadashi Kozuno, Alona Fyshe, Martha White
The performance of reinforcement learning (RL) agents is sensitive to the choice of hyperparameters.
no code implementations • 1 Apr 2022 • Banafsheh Rafiee, Jun Jin, Jun Luo, Adam White
Our focus on the role of the target policy of the auxiliary tasks is motivated by the fact that the target policy determines the behavior about which the agent wants to make a prediction and the state-action distribution that the agent is trained on, which further affects the main task learning.
no code implementations • 30 Mar 2022 • Han Wang, Erfan Miahi, Martha White, Marlos C. Machado, Zaheer Abbas, Raksha Kumaraswamy, Vincent Liu, Adam White
In this paper we investigate the properties of representations learned by deep reinforcement learning systems.
no code implementations • 17 Mar 2022 • Patrick M. Pilarski, Andrew Butcher, Elnaz Davoodi, Michael Bradley Johanson, Dylan J. A. Brenneis, Adam S. R. Parker, Leslie Acker, Matthew M. Botvinick, Joseph Modayil, Adam White
Our results showcase the speed of learning for Pavlovian signalling, the impact that different temporal representations do (and do not) have on agent-agent coordination, and how temporal aliasing impacts agent-agent and human-agent interactions differently.
no code implementations • NeurIPS 2021 • Matthew McLeod, Chunlok Lo, Matthew Schlegel, Andrew Jacobsen, Raksha Kumaraswamy, Martha White, Adam White
Learning auxiliary tasks, such as multiple predictions about the world, can provide many benefits to reinforcement learning systems.
no code implementations • 7 Feb 2022 • Richard S. Sutton, Marlos C. Machado, G. Zacharias Holland, David Szepesvari, Finbarr Timbers, Brian Tanner, Adam White
Each subtask is solved to produce an option, and then a model of the option is learned and made available to the planning process.
Model-based Reinforcement Learning
reinforcement-learning
+2
no code implementations • 11 Jan 2022 • Andrew Butcher, Michael Bradley Johanson, Elnaz Davoodi, Dylan J. A. Brenneis, Leslie Acker, Adam S. R. Parker, Adam White, Joseph Modayil, Patrick M. Pilarski
We further show how to computationally build this adaptive signalling process out of a fixed signalling process, characterized by fast continual prediction learning and minimal constraints on the nature of the agent receiving signals.
no code implementations • 14 Dec 2021 • Dylan J. A. Brenneis, Adam S. Parker, Michael Bradley Johanson, Andrew Butcher, Elnaz Davoodi, Leslie Acker, Matthew M. Botvinick, Joseph Modayil, Adam White, Patrick M. Pilarski
Additionally, we compare two different agent architectures to assess how representational choices in agent design affect the human-agent interaction.
no code implementations • 20 Sep 2021 • Adam White, Artur d'Avila Garcez
We will further illustrate how explainable AI methods that provide both causal equations and counterfactual instances can successfully explain machine learning predictions.
no code implementations • 12 Jul 2021 • Ray Jiang, Shangtong Zhang, Veronica Chelu, Adam White, Hado van Hasselt
We develop a multi-step emphatic weighting that can be combined with replay, and a time-reversed $n$-step TD learning algorithm to learn the required emphatic weighting.
1 code implementation • 28 Jun 2021 • Adam White, Kwun Ho Ngan, James Phelan, Saman Sadeghi Afgeh, Kevin Ryan, Constantino Carlos Reyes-Aldasoro, Artur d'Avila Garcez
A novel explainable AI method called CLEAR Image is introduced in this paper.
no code implementations • 21 Jun 2021 • Ray Jiang, Tom Zahavy, Zhongwen Xu, Adam White, Matteo Hessel, Charles Blundell, Hado van Hasselt
In this paper, we extend the use of emphatic methods to deep reinforcement learning agents.
no code implementations • 28 Apr 2021 • Andrew Patterson, Adam White, Martha White
Many algorithms have been developed for off-policy value estimation based on the linear mean squared projected Bellman error (MSPBE) and are sound under linear function approximation.
1 code implementation • 9 Nov 2020 • Banafsheh Rafiee, Zaheer Abbas, Sina Ghiassian, Raksha Kumaraswamy, Richard Sutton, Elliot Ludvig, Adam White
We present three new diagnostic prediction problems inspired by classical-conditioning experiments to facilitate research in online prediction learning.
no code implementations • 7 Jul 2020 • Vincent Liu, Adam White, Hengshuai Yao, Martha White
In this work, we provide a definition of interference for control in reinforcement learning.
1 code implementation • ICML 2020 • Sina Ghiassian, Andrew Patterson, Shivam Garg, Dhawal Gupta, Adam White, Martha White
It is still common to use Q-learning and temporal difference (TD) learning-even though they have divergence issues and sound Gradient TD alternatives exist-because divergence seems rare and they typically perform well.
no code implementations • ICLR 2020 • Somjit Nath, Vincent Liu, Alan Chan, Xin Li, Adam White, Martha White
Recurrent neural networks (RNNs) allow an agent to construct a state-representation from a stream of experience, which is essential in partially observable problems.
no code implementations • 16 Mar 2020 • Sina Ghiassian, Banafsheh Rafiee, Yat Long Lo, Adam White
Unfortunately, the performance of deep reinforcement learning systems is sensitive to hyper-parameter settings and architecture choices.
no code implementations • 8 Aug 2019 • Adam White, Artur d'Avila Garcez
We propose a novel method for explaining the predictions of any classifier.
no code implementations • 17 Jul 2019 • Andrew Jacobsen, Matthew Schlegel, Cameron Linke, Thomas Degris, Adam White, Martha White
This paper investigates different vector step-size adaptation approaches for non-stationary online, continual prediction problems.
no code implementations • 19 Jun 2019 • Cam Linke, Nadia M. Ady, Martha White, Thomas Degris, Adam White
The question we tackle in this paper is how to sculpt the stream of experience---how to adapt the learning system's behavior---to optimize the learning of a collection of value functions.
no code implementations • 2 Apr 2019 • Yi Wan, Zaheer Abbas, Adam White, Martha White, Richard S. Sutton
In particular, we 1) show that planning with an expectation model is equivalent to planning with a distribution model if the state value function is linear in state features, 2) analyze two common parametrization choices for approximating the expectation: linear and non-linear expectation models, 3) propose a sound model-based policy evaluation algorithm and present its convergence results, and 4) empirically demonstrate the effectiveness of the proposed planning algorithm.
no code implementations • 16 Nov 2018 • Tom Schaul, Hado van Hasselt, Joseph Modayil, Martha White, Adam White, Pierre-Luc Bacon, Jean Harb, Shibl Mourad, Marc Bellemare, Doina Precup
We want to make progress toward artificial general intelligence, namely general-purpose agents that autonomously learn how to competently act in complex environments.
no code implementations • NeurIPS 2018 • Raksha Kumaraswamy, Matthew Schlegel, Adam White, Martha White
Directed exploration strategies for reinforcement learning are critical for learning an optimal policy in a minimal number of interactions with the environment.
no code implementations • 6 Nov 2018 • Sina Ghiassian, Andrew Patterson, Martha White, Richard S. Sutton, Adam White
The ability to learn behavior-contingent predictions online and off-policy has long been advocated as a key capability of predictive-knowledge learning systems but remained an open algorithmic challenge for decades.
1 code implementation • 22 Oct 2018 • Samuel Neumann, Sungsu Lim, Ajin Joseph, Yangchen Pan, Adam White, Martha White
We first provide a policy improvement result in an idealized setting, and then prove that our conditional CEM (CCEM) strategy tracks a CEM update per state, even with changing action-values.
no code implementations • 18 Jul 2018 • Matthew Schlegel, Andrew Jacobsen, Zaheer Abbas, Andrew Patterson, Adam White, Martha White
A general purpose strategy for state construction is to learn the state update using a Recurrent Neural Network (RNN), which updates the internal state using the current internal state and the most recent observation.
no code implementations • 12 Jun 2018 • Yangchen Pan, Muhammad Zaheer, Adam White, Andrew Patterson, Martha White
We show that a model, as opposed to a replay buffer, is particularly useful for specifying which states to sample from during planning, such as predecessor states that propagate information in reverse from a state more quickly.
no code implementations • 25 Jan 2018 • Craig Sherstan, Brendan Bennett, Kenny Young, Dylan R. Ashley, Adam White, Martha White, Richard S. Sutton
This paper investigates estimating the variance of a temporal-difference learning agent's update target.
no code implementations • ICLR 2018 • Matthew Schlegel, Andrew Patterson, Adam White, Martha White
We investigate a framework for discovery: curating a large collection of predictions, which are used to construct the agent's representation of the world.
no code implementations • 10 May 2017 • Adam White, Richard S. Sutton
This document should serve as a quick reference for and guide to the implementation of linear GQ($\lambda$), a gradient-based off-policy temporal-difference learning algorithm.
no code implementations • 28 Nov 2016 • Yangchen Pan, Adam White, Martha White
The family of temporal difference (TD) methods span a spectrum from computationally frugal linear methods like TD({\lambda}) to data efficient least squares methods.
2 code implementations • 2 Jul 2016 • Martha White, Adam White
One of the main obstacles to broad application of reinforcement learning methods is the parameter sensitivity of our core learning algorithms.
no code implementations • 17 Jun 2016 • Craig Sherstan, Adam White, Marlos C. Machado, Patrick M. Pilarski
Agents of general intelligence deployed in real-world scenarios must adapt to ever-changing environmental conditions.
1 code implementation • 28 Feb 2016 • Adam White, Martha White
First, we derive two new hybrid TD policy-evaluation algorithms, which fill a gap in this collection of algorithms.
no code implementations • 6 Dec 2011 • Joseph Modayil, Adam White, Richard S. Sutton
The term "nexting" has been used by psychologists to refer to the propensity of people and many other animals to continually predict what will happen next in an immediate, local, and personal sense.
no code implementations • NeurIPS 2010 • Martha White, Adam White
The reinforcement learning community has explored many approaches to obtain- ing value estimates and models to guide decision making; these approaches, how- ever, do not usually provide a measure of confidence in the estimate.