Search Results for author: Adam White

Found 50 papers, 9 papers with code

Tuning for the Unknown: Revisiting Evaluation Strategies for Lifelong RL

no code implementations2 Apr 2024 Golnaz Mesbahi, Olya Mastikhina, Parham Mohammad Panahi, Martha White, Adam White

In this paper we propose a new approach for tuning and evaluating lifelong RL agents where only one percent of the experiment data can be used for hyperparameter tuning.

Application-Driven Innovation in Machine Learning

no code implementations26 Mar 2024 David Rolnick, Alan Aspuru-Guzik, Sara Beery, Bistra Dilkina, Priya L. Donti, Marzyeh Ghassemi, Hannah Kerner, Claire Monteleoni, Esther Rolf, Milind Tambe, Adam White

As applications of machine learning proliferate, innovative algorithms inspired by specific real-world challenges have become increasingly important.

GVFs in the Real World: Making Predictions Online for Water Treatment

no code implementations4 Dec 2023 Muhammad Kamran Janjua, Haseeb Shah, Martha White, Erfan Miahi, Marlos C. Machado, Adam White

In this paper we investigate the use of reinforcement-learning based prediction approaches for a real drinking-water treatment plant.

Time Series Prediction

Harnessing Discrete Representations For Continual Reinforcement Learning

no code implementations2 Dec 2023 Edan Meyer, Adam White, Marlos C. Machado

In this work, we provide a thorough empirical investigation of the advantages of representing observations as vectors of categorical values within the context of reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Predicting recovery following stroke: deep learning, multimodal data and feature selection using explainable AI

no code implementations29 Oct 2023 Adam White, Margarita Saranti, Artur d'Avila Garcez, Thomas M. H. Hope, Cathy J. Price, Howard Bowman

The highest classification accuracy 0. 854 was observed when 8 regions-of-interest was extracted from each MRI scan and combined with lesion size, initial severity and recovery time in a 2D Residual Neural Network. Our findings demonstrate how imaging and tabular data can be combined for high post-stroke classification accuracy, even when the dataset is small in machine learning terms.

feature selection Stroke Classification

Recurrent Linear Transformers

1 code implementation24 Oct 2023 Subhojeet Pramanik, Esraa Elelimy, Marlos C. Machado, Adam White

In this paper we introduce recurrent alternatives to the transformer self-attention mechanism that offer a context-independent inference cost, leverage long-range dependencies effectively, and perform well in practice.

Measuring and Mitigating Interference in Reinforcement Learning

no code implementations10 Jul 2023 Vincent Liu, Han Wang, Ruo Yu Tao, Khurram Javed, Adam White, Martha White

Lastly, we outline a class of algorithms which we call online-aware that are designed to mitigate interference, and show they do reduce interference according to our measure and that they improve stability and performance in several classic control environments.

reinforcement-learning

Empirical Design in Reinforcement Learning

no code implementations3 Apr 2023 Andrew Patterson, Samuel Neumann, Martha White, Adam White

The objective of this document is to provide answers on how we can use our unprecedented compute to do good science in reinforcement learning, as well as stay alert to potential pitfalls in our empirical design.

reinforcement-learning

The In-Sample Softmax for Offline Reinforcement Learning

4 code implementations28 Feb 2023 Chenjun Xiao, Han Wang, Yangchen Pan, Adam White, Martha White

We highlight a simple fact: it is more straightforward to approximate an in-sample \emph{softmax} using only actions in the dataset.

Offline RL reinforcement-learning +1

Agent-State Construction with Auxiliary Inputs

1 code implementation15 Nov 2022 Ruo Yu Tao, Adam White, Marlos C. Machado

Finally, we show that this approach is complementary to state-of-the-art methods such as recurrent neural networks and truncated back-propagation through time, and acts as a heuristic that facilitates longer temporal credit assignment, leading to better performance.

Decision Making reinforcement-learning +1

Auxiliary task discovery through generate-and-test

no code implementations25 Oct 2022 Banafsheh Rafiee, Sina Ghiassian, Jun Jin, Richard Sutton, Jun Luo, Adam White

In this paper, we explore an approach to auxiliary task discovery in reinforcement learning based on ideas from representation learning.

Meta-Learning Representation Learning

Goal-Space Planning with Subgoal Models

no code implementations6 Jun 2022 Chunlok Lo, Kevin Roice, Parham Mohammad Panahi, Scott Jordan, Adam White, Gabor Mihucz, Farzane Aminmansour, Martha White

In this paper, we avoid this limitation by constraining background planning to a set of (abstract) subgoals and learning only local, subgoal-conditioned models.

Model-based Reinforcement Learning Reinforcement Learning (RL)

What makes useful auxiliary tasks in reinforcement learning: investigating the effect of the target policy

no code implementations1 Apr 2022 Banafsheh Rafiee, Jun Jin, Jun Luo, Adam White

Our focus on the role of the target policy of the auxiliary tasks is motivated by the fact that the target policy determines the behavior about which the agent wants to make a prediction and the state-action distribution that the agent is trained on, which further affects the main task learning.

Representation Learning

The Frost Hollow Experiments: Pavlovian Signalling as a Path to Coordination and Communication Between Agents

no code implementations17 Mar 2022 Patrick M. Pilarski, Andrew Butcher, Elnaz Davoodi, Michael Bradley Johanson, Dylan J. A. Brenneis, Adam S. R. Parker, Leslie Acker, Matthew M. Botvinick, Joseph Modayil, Adam White

Our results showcase the speed of learning for Pavlovian signalling, the impact that different temporal representations do (and do not) have on agent-agent coordination, and how temporal aliasing impacts agent-agent and human-agent interactions differently.

Decision Making reinforcement-learning +1

Pavlovian Signalling with General Value Functions in Agent-Agent Temporal Decision Making

no code implementations11 Jan 2022 Andrew Butcher, Michael Bradley Johanson, Elnaz Davoodi, Dylan J. A. Brenneis, Leslie Acker, Adam S. R. Parker, Adam White, Joseph Modayil, Patrick M. Pilarski

We further show how to computationally build this adaptive signalling process out of a fixed signalling process, characterized by fast continual prediction learning and minimal constraints on the nature of the agent receiving signals.

Decision Making reinforcement-learning +1

Counterfactual Instances Explain Little

no code implementations20 Sep 2021 Adam White, Artur d'Avila Garcez

We will further illustrate how explainable AI methods that provide both causal equations and counterfactual instances can successfully explain machine learning predictions.

BIG-bench Machine Learning counterfactual +1

Learning Expected Emphatic Traces for Deep RL

no code implementations12 Jul 2021 Ray Jiang, Shangtong Zhang, Veronica Chelu, Adam White, Hado van Hasselt

We develop a multi-step emphatic weighting that can be combined with replay, and a time-reversed $n$-step TD learning algorithm to learn the required emphatic weighting.

A Generalized Projected Bellman Error for Off-policy Value Estimation in Reinforcement Learning

no code implementations28 Apr 2021 Andrew Patterson, Adam White, Martha White

Many algorithms have been developed for off-policy value estimation based on the linear mean squared projected Bellman error (MSPBE) and are sound under linear function approximation.

reinforcement-learning Reinforcement Learning (RL)

From Eye-blinks to State Construction: Diagnostic Benchmarks for Online Representation Learning

1 code implementation9 Nov 2020 Banafsheh Rafiee, Zaheer Abbas, Sina Ghiassian, Raksha Kumaraswamy, Richard Sutton, Elliot Ludvig, Adam White

We present three new diagnostic prediction problems inspired by classical-conditioning experiments to facilitate research in online prediction learning.

Continual Learning Representation Learning

Gradient Temporal-Difference Learning with Regularized Corrections

1 code implementation ICML 2020 Sina Ghiassian, Andrew Patterson, Shivam Garg, Dhawal Gupta, Adam White, Martha White

It is still common to use Q-learning and temporal difference (TD) learning-even though they have divergence issues and sound Gradient TD alternatives exist-because divergence seems rare and they typically perform well.

Q-Learning

Training Recurrent Neural Networks Online by Learning Explicit State Variables

no code implementations ICLR 2020 Somjit Nath, Vincent Liu, Alan Chan, Xin Li, Adam White, Martha White

Recurrent neural networks (RNNs) allow an agent to construct a state-representation from a stream of experience, which is essential in partially observable problems.

Improving Performance in Reinforcement Learning by Breaking Generalization in Neural Networks

no code implementations16 Mar 2020 Sina Ghiassian, Banafsheh Rafiee, Yat Long Lo, Adam White

Unfortunately, the performance of deep reinforcement learning systems is sensitive to hyper-parameter settings and architecture choices.

reinforcement-learning Reinforcement Learning (RL)

Meta-descent for Online, Continual Prediction

no code implementations17 Jul 2019 Andrew Jacobsen, Matthew Schlegel, Cameron Linke, Thomas Degris, Adam White, Martha White

This paper investigates different vector step-size adaptation approaches for non-stationary online, continual prediction problems.

Second-order methods Time Series +1

Adapting Behaviour via Intrinsic Reward: A Survey and Empirical Study

no code implementations19 Jun 2019 Cam Linke, Nadia M. Ady, Martha White, Thomas Degris, Adam White

The question we tackle in this paper is how to sculpt the stream of experience---how to adapt the learning system's behavior---to optimize the learning of a collection of value functions.

Active Learning reinforcement-learning +2

Planning with Expectation Models

no code implementations2 Apr 2019 Yi Wan, Zaheer Abbas, Adam White, Martha White, Richard S. Sutton

In particular, we 1) show that planning with an expectation model is equivalent to planning with a distribution model if the state value function is linear in state features, 2) analyze two common parametrization choices for approximating the expectation: linear and non-linear expectation models, 3) propose a sound model-based policy evaluation algorithm and present its convergence results, and 4) empirically demonstrate the effectiveness of the proposed planning algorithm.

Model-based Reinforcement Learning

The Barbados 2018 List of Open Issues in Continual Learning

no code implementations16 Nov 2018 Tom Schaul, Hado van Hasselt, Joseph Modayil, Martha White, Adam White, Pierre-Luc Bacon, Jean Harb, Shibl Mourad, Marc Bellemare, Doina Precup

We want to make progress toward artificial general intelligence, namely general-purpose agents that autonomously learn how to competently act in complex environments.

Continual Learning

Context-Dependent Upper-Confidence Bounds for Directed Exploration

no code implementations NeurIPS 2018 Raksha Kumaraswamy, Matthew Schlegel, Adam White, Martha White

Directed exploration strategies for reinforcement learning are critical for learning an optimal policy in a minimal number of interactions with the environment.

Efficient Exploration

Online Off-policy Prediction

no code implementations6 Nov 2018 Sina Ghiassian, Andrew Patterson, Martha White, Richard S. Sutton, Adam White

The ability to learn behavior-contingent predictions online and off-policy has long been advocated as a key capability of predictive-knowledge learning systems but remained an open algorithmic challenge for decades.

Greedy Actor-Critic: A New Conditional Cross-Entropy Method for Policy Improvement

1 code implementation22 Oct 2018 Samuel Neumann, Sungsu Lim, Ajin Joseph, Yangchen Pan, Adam White, Martha White

We first provide a policy improvement result in an idealized setting, and then prove that our conditional CEM (CCEM) strategy tracks a CEM update per state, even with changing action-values.

Policy Gradient Methods Q-Learning

General Value Function Networks

no code implementations18 Jul 2018 Matthew Schlegel, Andrew Jacobsen, Zaheer Abbas, Andrew Patterson, Adam White, Martha White

A general purpose strategy for state construction is to learn the state update using a Recurrent Neural Network (RNN), which updates the internal state using the current internal state and the most recent observation.

Continuous Control Decision Making

Organizing Experience: A Deeper Look at Replay Mechanisms for Sample-based Planning in Continuous State Domains

no code implementations12 Jun 2018 Yangchen Pan, Muhammad Zaheer, Adam White, Andrew Patterson, Martha White

We show that a model, as opposed to a replay buffer, is particularly useful for specifying which states to sample from during planning, such as predecessor states that propagate information in reverse from a state more quickly.

Discovery of Predictive Representations With a Network of General Value Functions

no code implementations ICLR 2018 Matthew Schlegel, Andrew Patterson, Adam White, Martha White

We investigate a framework for discovery: curating a large collection of predictions, which are used to construct the agent's representation of the world.

Decision Making

GQ($λ$) Quick Reference and Implementation Guide

no code implementations10 May 2017 Adam White, Richard S. Sutton

This document should serve as a quick reference for and guide to the implementation of linear GQ($\lambda$), a gradient-based off-policy temporal-difference learning algorithm.

Accelerated Gradient Temporal Difference Learning

no code implementations28 Nov 2016 Yangchen Pan, Adam White, Martha White

The family of temporal difference (TD) methods span a spectrum from computationally frugal linear methods like TD({\lambda}) to data efficient least squares methods.

A Greedy Approach to Adapting the Trace Parameter for Temporal Difference Learning

2 code implementations2 Jul 2016 Martha White, Adam White

One of the main obstacles to broad application of reinforcement learning methods is the parameter sensitivity of our core learning algorithms.

Meta-Learning reinforcement-learning +1

Introspective Agents: Confidence Measures for General Value Functions

no code implementations17 Jun 2016 Craig Sherstan, Adam White, Marlos C. Machado, Patrick M. Pilarski

Agents of general intelligence deployed in real-world scenarios must adapt to ever-changing environmental conditions.

Position

Investigating practical linear temporal difference learning

1 code implementation28 Feb 2016 Adam White, Martha White

First, we derive two new hybrid TD policy-evaluation algorithms, which fill a gap in this collection of algorithms.

reinforcement-learning Reinforcement Learning (RL)

Multi-timescale Nexting in a Reinforcement Learning Robot

no code implementations6 Dec 2011 Joseph Modayil, Adam White, Richard S. Sutton

The term "nexting" has been used by psychologists to refer to the propensity of people and many other animals to continually predict what will happen next in an immediate, local, and personal sense.

reinforcement-learning Reinforcement Learning (RL)

Interval Estimation for Reinforcement-Learning Algorithms in Continuous-State Domains

no code implementations NeurIPS 2010 Martha White, Adam White

The reinforcement learning community has explored many approaches to obtain- ing value estimates and models to guide decision making; these approaches, how- ever, do not usually provide a measure of confidence in the estimate.

Decision Making reinforcement-learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.