Search Results for author: Martha White

Found 92 papers, 27 papers with code

Tuning for the Unknown: Revisiting Evaluation Strategies for Lifelong RL

no code implementations • 2 Apr 2024 • Golnaz Mesbahi, Olya Mastikhina, Parham Mohammad Panahi, Martha White, Adam White

In this paper we propose a new approach for tuning and evaluating lifelong RL agents where only one percent of the experiment data can be used for hyperparameter tuning.

Paper
Add Code

Investigating the Histogram Loss in Regression

1 code implementation • 20 Feb 2024 • Ehsan Imani, Kai Luedemann, Sam Scholnick-Hughes, Esraa Elelimy, Martha White

It is becoming increasingly common in regression to train neural networks that model the entire distribution even if only the mean is required for prediction.

regression

Paper
Code

What to Do When Your Discrete Optimization Is the Size of a Neural Network?

1 code implementation • 15 Feb 2024 • Hugo Silva, Martha White

Oftentimes, machine learning applications using neural networks involve solving discrete optimization problems, such as in pruning, parameter-isolation-based continual learning and training of binary networks.

Continual Learning Image Classification +1

Paper
Code

Compound Returns Reduce Variance in Reinforcement Learning

no code implementations • 6 Feb 2024 • Brett Daley, Martha White, Marlos C. Machado

Multistep returns, such as $n$-step returns and $\lambda$-returns, are commonly used to improve the sample efficiency of reinforcement learning (RL) methods.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

When is Offline Policy Selection Sample Efficient for Reinforcement Learning?

no code implementations • 4 Dec 2023 • Vincent Liu, Prabhat Nagarajan, Andrew Patterson, Martha White

As a result, no OPS method can be more sample efficient than OPE in the worst case.

reinforcement-learning

Paper
Add Code

GVFs in the Real World: Making Predictions Online for Water Treatment

no code implementations • 4 Dec 2023 • Muhammad Kamran Janjua, Haseeb Shah, Martha White, Erfan Miahi, Marlos C. Machado, Adam White

In this paper we investigate the use of reinforcement-learning based prediction approaches for a real drinking-water treatment plant.

Time Series Prediction

Paper
Add Code

Measuring and Mitigating Interference in Reinforcement Learning

no code implementations • 10 Jul 2023 • Vincent Liu, Han Wang, Ruo Yu Tao, Khurram Javed, Adam White, Martha White

Lastly, we outline a class of algorithms which we call online-aware that are designed to mitigate interference, and show they do reduce interference according to our measure and that they improve stability and performance in several classic control environments.

reinforcement-learning

Paper
Add Code

Coagent Networks: Generalized and Scaled

no code implementations • 16 May 2023 • James E. Kostas, Scott M. Jordan, Yash Chandak, Georgios Theocharous, Dhawal Gupta, Martha White, Bruno Castro da Silva, Philip S. Thomas

However, the coagent framework is not just an alternative to BDL; the two approaches can be blended: BDL can be combined with coagent learning rules to create architectures with the advantages of both approaches.

Reinforcement Learning (RL)

Paper
Add Code

Empirical Design in Reinforcement Learning

no code implementations • 3 Apr 2023 • Andrew Patterson, Samuel Neumann, Martha White, Adam White

The objective of this document is to provide answers on how we can use our unprecedented compute to do good science in reinforcement learning, as well as stay alert to potential pitfalls in our empirical design.

reinforcement-learning

Paper
Add Code

The In-Sample Softmax for Offline Reinforcement Learning

4 code implementations • 28 Feb 2023 • Chenjun Xiao, Han Wang, Yangchen Pan, Adam White, Martha White

We highlight a simple fact: it is more straightforward to approximate an in-sample \emph{softmax} using only actions in the dataset.

Offline RL reinforcement-learning +1

Paper
Code

Asymptotically Unbiased Off-Policy Policy Evaluation when Reusing Old Data in Nonstationary Environments

no code implementations • 23 Feb 2023 • Vincent Liu, Yash Chandak, Philip Thomas, Martha White

In this work, we consider the off-policy policy evaluation problem for contextual bandits and finite horizon reinforcement learning in the nonstationary setting.

Multi-Armed Bandits regression +2

Paper
Add Code

Generalized Munchausen Reinforcement Learning using Tsallis KL Divergence

no code implementations • 27 Jan 2023 • Lingwei Zhu, Zheng Chen, Matthew Schlegel, Martha White

Many policy optimization approaches in reinforcement learning incorporate a Kullback-Leilbler (KL) divergence to the previous policy, to prevent the policy from changing too quickly.

Atari Games reinforcement-learning +1

Paper
Add Code

Trajectory-Aware Eligibility Traces for Off-Policy Reinforcement Learning

1 code implementation • 26 Jan 2023 • Brett Daley, Martha White, Christopher Amato, Marlos C. Machado

Off-policy learning from multistep returns is crucial for sample-efficient reinforcement learning, but counteracting off-policy bias without exacerbating variance is challenging.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Scalable Real-Time Recurrent Learning Using Columnar-Constructive Networks

1 code implementation • 20 Jan 2023 • Khurram Javed, Haseeb Shah, Rich Sutton, Martha White

We show that by either decomposing the network into independent modules or learning the network in stages, we can make RTRL scale linearly with the number of parameters.

Atari Games

Paper
Code

Goal-Space Planning with Subgoal Models

no code implementations • 6 Jun 2022 • Chunlok Lo, Kevin Roice, Parham Mohammad Panahi, Scott Jordan, Adam White, Gabor Mihucz, Farzane Aminmansour, Martha White

In this paper, we avoid this limitation by constraining background planning to a set of (abstract) subgoals and learning only local, subgoal-conditioned models.

Model-based Reinforcement Learning Reinforcement Learning (RL)

Paper
Add Code

No More Pesky Hyperparameters: Offline Hyperparameter Tuning for RL

no code implementations • 18 May 2022 • Han Wang, Archit Sakhadeo, Adam White, James Bell, Vincent Liu, Xutong Zhao, Puer Liu, Tadashi Kozuno, Alona Fyshe, Martha White

The performance of reinforcement learning (RL) agents is sensitive to the choice of hyperparameters.

Reinforcement Learning (RL)

Paper
Add Code

Robust Losses for Learning Value Functions

no code implementations • 17 May 2022 • Andrew Patterson, Victor Liao, Martha White

We start from a formalization of robust losses, then derive sound gradient-based approaches to minimize these losses in both the online off-policy prediction and control settings.

Paper
Add Code

Investigating the Properties of Neural Network Representations in Reinforcement Learning

no code implementations • 30 Mar 2022 • Han Wang, Erfan Miahi, Martha White, Marlos C. Machado, Zaheer Abbas, Raksha Kumaraswamy, Vincent Liu, Adam White

In this paper we investigate the properties of representations learned by deep reinforcement learning systems.

Q-Learning reinforcement-learning +2

Paper
Add Code

Resonance in Weight Space: Covariate Shift Can Drive Divergence of SGD with Momentum

no code implementations • ICLR 2022 • Kirby Banman, Liam Peet-Pare, Nidhi Hegde, Alona Fyshe, Martha White

In this work, we show that SGDm under covariate shift with a fixed step-size can be unstable and diverge.

Continual Learning

Paper
Add Code

Continual Auxiliary Task Learning

no code implementations • NeurIPS 2021 • Matthew McLeod, Chunlok Lo, Matthew Schlegel, Andrew Jacobsen, Raksha Kumaraswamy, Martha White, Adam White

Learning auxiliary tasks, such as multiple predictions about the world, can provide many benefits to reinforcement learning systems.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

A Temporal-Difference Approach to Policy Gradient Estimation

1 code implementation • 4 Feb 2022 • Samuele Tosatto, Andrew Patterson, Martha White, A. Rupam Mahmood

The policy gradient theorem (Sutton et al., 2000) prescribes the usage of a cumulative discounted state distribution under the target policy to approximate the gradient.

Paper
Code

An Alternate Policy Gradient Estimator for Softmax Policies

1 code implementation • 22 Dec 2021 • Shivam Garg, Samuele Tosatto, Yangchen Pan, Martha White, A. Rupam Mahmood

Policy gradient (PG) estimators are ineffective in dealing with softmax policies that are sub-optimally saturated, which refers to the situation when the policy concentrates its probability mass on sub-optimal actions.

Paper
Code

Representation Alignment in Neural Networks

1 code implementation • 15 Dec 2021 • Ehsan Imani, Wei Hu, Martha White

We then highlight why alignment between the top singular vectors and the targets can speed up learning and show in a classic synthetic transfer problem that representation alignment correlates with positive and negative transfer to similar and dissimilar tasks.

Paper
Code

Structural Credit Assignment in Neural Networks using Reinforcement Learning

no code implementations • NeurIPS 2021 • Dhawal Gupta, Gabor Mihucz, Matthew Schlegel, James Kostas, Philip S. Thomas, Martha White

In this work, we revisit this approach and investigate if we can leverage other reinforcement learning approaches to improve learning.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Off-Policy Actor-Critic with Emphatic Weightings

1 code implementation • 16 Nov 2021 • Eric Graves, Ehsan Imani, Raksha Kumaraswamy, Martha White

A variety of theoretically-sound policy gradient algorithms exist for the on-policy setting due to the policy gradient theorem, which provides a simplified form for the gradient.

Paper
Code

Exploiting Action Impact Regularity and Exogenous State Variables for Offline Reinforcement Learning

no code implementations • 15 Nov 2021 • Vincent Liu, James R. Wright, Martha White

Offline reinforcement learning -- learning a policy from a batch of data -- is known to be hard for general MDPs.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Resmax: An Alternative Soft-Greedy Operator for Reinforcement Learning

no code implementations • 29 Sep 2021 • Erfan Miahi, Revan MacQueen, Alex Ayoub, Abbas Masoumzadeh, Martha White

Soft-greedy operators, namely $\varepsilon$-greedy and softmax, remain a common choice to induce a basic level of exploration for action-value methods in reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Offline-Online Reinforcement Learning: Extending Batch and Online RL

no code implementations • 29 Sep 2021 • Maryam Hashemzadeh, Wesley Chung, Martha White

To enable better performance, we investigate the offline-online setting: The agent has access to a batch of data to train on but is also allowed to learn during the evaluation phase in an online manner.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences

no code implementations • 17 Jul 2021 • Alan Chan, Hugo Silva, Sungsu Lim, Tadashi Kozuno, A. Rupam Mahmood, Martha White

Approximate Policy Iteration (API) algorithms alternate between (approximate) policy evaluation and (approximate) greedification.

Policy Gradient Methods

Paper
Add Code

Predictive Representation Learning for Language Modeling

no code implementations • 29 May 2021 • Qingfeng Lan, Luke Kumar, Martha White, Alona Fyshe

Correlates of secondary information appear in LSTM representations even though they are not part of an \emph{explicitly} supervised prediction task.

Language Modelling Reinforcement Learning (RL) +1

Paper
Add Code

A Generalized Projected Bellman Error for Off-policy Value Estimation in Reinforcement Learning

no code implementations • 28 Apr 2021 • Andrew Patterson, Adam White, Martha White

Many algorithms have been developed for off-policy value estimation based on the linear mean squared projected Bellman error (MSPBE) and are sound under linear function approximation.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Scalable Online Recurrent Learning Using Columnar Neural Networks

1 code implementation • 9 Mar 2021 • Khurram Javed, Martha White, Rich Sutton

We empirically show that as long as connections between columns are sparse, our method approximates the true gradient well.

Meta-Learning

Paper
Code

Perspectives on Sim2Real Transfer for Robotics: A Summary of the R:SS 2020 Workshop

no code implementations • 7 Dec 2020 • Sebastian Höfer, Kostas Bekris, Ankur Handa, Juan Camilo Gamboa, Florian Golemo, Melissa Mozifian, Chris Atkeson, Dieter Fox, Ken Goldberg, John Leonard, C. Karen Liu, Jan Peters, Shuran Song, Peter Welinder, Martha White

This report presents the debates, posters, and discussions of the Sim2Real workshop held in conjunction with the 2020 edition of the "Robotics: Science and System" conference.

Paper
Add Code

Towards Safe Policy Improvement for Non-Stationary MDPs

1 code implementation • NeurIPS 2020 • Yash Chandak, Scott M. Jordan, Georgios Theocharous, Martha White, Philip S. Thomas

Many real-world sequential decision-making problems involve critical systems with financial risks and human-life risks.

Decision Making reinforcement-learning +4

Paper
Code

From Language to Language-ish: How Brain-Like is an LSTM's Representation of Nonsensical Language Stimuli?

no code implementations • Findings of the Association for Computational Linguistics 2020 • Maryam Hashemzadeh, Greta Kaufeld, Martha White, Andrea E. Martin, Alona Fyshe

The representations generated by many models of language (word embeddings, recurrent neural networks and transformers) correlate to brain activity recorded while people read.

Language Modelling Word Embeddings

Paper
Add Code

Beyond Prioritized Replay: Sampling States in Model-Based RL via Simulated Priorities

1 code implementation • 28 Sep 2020 • Jincheng Mei, Yangchen Pan, Martha White, Amir-Massoud Farahmand, Hengshuai Yao

The prioritized Experience Replay (ER) method has attracted great attention; however, there is little theoretical understanding of such prioritization strategy and why they help.

2,353

Paper
Code

Understanding and Mitigating the Limitations of Prioritized Experience Replay

2 code implementations • 19 Jul 2020 • Yangchen Pan, Jincheng Mei, Amir-Massoud Farahmand, Martha White, Hengshuai Yao, Mohsen Rohani, Jun Luo

Prioritized Experience Replay (ER) has been empirically shown to improve sample efficiency across many domains and attracted great attention; however, there is little theoretical understanding of why such prioritized sampling helps and its limitations.

Autonomous Driving Continuous Control +1

2,353

Paper
Code

Towards a practical measure of interference for reinforcement learning

no code implementations • 7 Jul 2020 • Vincent Liu, Adam White, Hengshuai Yao, Martha White

In this work, we provide a definition of interference for control in reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Selective Dyna-style Planning Under Limited Model Capacity

no code implementations • ICML 2020 • Zaheer Abbas, Samuel Sokota, Erin J. Talvitie, Martha White

We show that heteroscedastic regression can signal predictive uncertainty arising from model inadequacy that is complementary to that which is detected by methods designed for parameter uncertainty, indicating that considering both parameter uncertainty and model inadequacy may be a more promising direction for effective selective planning than either in isolation.

Model-based Reinforcement Learning

Paper
Add Code

Gradient Temporal-Difference Learning with Regularized Corrections

1 code implementation • ICML 2020 • Sina Ghiassian, Andrew Patterson, Shivam Garg, Dhawal Gupta, Adam White, Martha White

It is still common to use Q-learning and temporal difference (TD) learning-even though they have divergence issues and sound Gradient TD alternatives exist-because divergence seems rare and they typically perform well.

Q-Learning

Paper
Code

Learning Causal Models Online

1 code implementation • 12 Jun 2020 • Khurram Javed, Martha White, Yoshua Bengio

One solution for achieving strong generalization is to incorporate causal structures in the models; such structures constrain learning by ignoring correlations that contradict them.

Continual Learning

Paper
Code

Hallucinating Value: A Pitfall of Dyna-style Planning with Imperfect Environment Models

no code implementations • 8 Jun 2020 • Taher Jafferjee, Ehsan Imani, Erin Talvitie, Martha White, Micheal Bowling

Dyna-style reinforcement learning (RL) agents improve sample efficiency over model-free RL agents by updating the value function with simulated experience generated by an environment model.

Reinforcement Learning (RL)

Paper
Add Code

Optimizing for the Future in Non-Stationary MDPs

1 code implementation • ICML 2020 • Yash Chandak, Georgios Theocharous, Shiv Shankar, Martha White, Sridhar Mahadevan, Philip S. Thomas

Most reinforcement learning methods are based upon the key assumption that the transition dynamics and reward functions are fixed, that is, the underlying Markov decision process is stationary.

Paper
Code

Maximizing Information Gain in Partially Observable Environments via Prediction Reward

no code implementations • 11 May 2020 • Yash Satsangi, Sungsu Lim, Shimon Whiteson, Frans Oliehoek, Martha White

Information gathering in a partially observable environment can be formulated as a reinforcement learning (RL), problem where the reward depends on the agent's uncertainty.

Question Answering Reinforcement Learning (RL)

Paper
Add Code

Training Recurrent Neural Networks Online by Learning Explicit State Variables

no code implementations • ICLR 2020 • Somjit Nath, Vincent Liu, Alan Chan, Xin Li, Adam White, Martha White

Recurrent neural networks (RNNs) allow an agent to construct a state-representation from a stream of experience, which is essential in partially observable problems.

Paper
Add Code

Maxmin Q-learning: Controlling the Estimation Bias of Q-learning

1 code implementation • ICLR 2020 • Qingfeng Lan, Yangchen Pan, Alona Fyshe, Martha White

Q-learning suffers from overestimation bias, because it approximates the maximum action value using the maximum estimated action value.

Q-Learning

Paper
Code

An implicit function learning approach for parametric modal regression

no code implementations • NeurIPS 2020 • Yangchen Pan, Ehsan Imani, Martha White, Amir-Massoud Farahmand

We empirically demonstrate on several synthetic problems that our method (i) can learn multi-valued functions and produce the conditional modes, (ii) scales well to high-dimensional inputs, and (iii) can even be more effective for certain uni-modal problems, particularly for high-frequency functions.

regression

Paper
Add Code

Learning Macroscopic Brain Connectomes via Group-Sparse Factorization

1 code implementation • NeurIPS 2019 • Farzane Aminmansour, Andrew Patterson, Lei Le, Yisu Peng, Daniel Mitchell, Franco Pestilli, Cesar F. Caiafa, Russell Greiner, Martha White

We develop an efficient optimization strategy for this extremely high-dimensional sparse problem, by reducing the number of parameters using a greedy algorithm designed specifically for the problem.

Paper
Code

Fuzzy Tiling Activations: A Simple Approach to Learning Sparse Representations Online

1 code implementation • ICLR 2021 • Yangchen Pan, Kirby Banman, Martha White

Recent work has shown that sparse representations -- where only a small percentage of units are active -- can significantly reduce interference.

Continual Learning Continuous Control +2

47,519

Paper
Code

Is Fast Adaptation All You Need?

no code implementations • 3 Oct 2019 • Khurram Javed, Hengshuai Yao, Martha White

Gradient-based meta-learning has proven to be highly effective at learning model initializations, representations, and update rules that allow fast adaptation from a few samples.

Incremental Learning Meta-Learning

Paper
Add Code

Meta-descent for Online, Continual Prediction

no code implementations • 17 Jul 2019 • Andrew Jacobsen, Matthew Schlegel, Cameron Linke, Thomas Degris, Adam White, Martha White

This paper investigates different vector step-size adaptation approaches for non-stationary online, continual prediction problems.

Second-order methods Time Series +1

Paper
Add Code

Adapting Behaviour via Intrinsic Reward: A Survey and Empirical Study

no code implementations • 19 Jun 2019 • Cam Linke, Nadia M. Ady, Martha White, Thomas Degris, Adam White

The question we tackle in this paper is how to sculpt the stream of experience---how to adapt the learning system's behavior---to optimize the learning of a collection of value functions.

Active Learning reinforcement-learning +2

Paper
Add Code

Hill Climbing on Value Estimates for Search-control in Dyna

no code implementations • 18 Jun 2019 • Yangchen Pan, Hengshuai Yao, Amir-Massoud Farahmand, Martha White

In this work, we propose to generate such states by using the trajectory obtained from Hill Climbing (HC) the current estimate of the value function.

Model-based Reinforcement Learning Reinforcement Learning (RL)

Paper
Add Code

Importance Resampling for Off-policy Prediction

2 code implementations • NeurIPS 2019 • Matthew Schlegel, Wesley Chung, Daniel Graves, Jian Qian, Martha White

Importance sampling (IS) is a common reweighting strategy for off-policy prediction in reinforcement learning.

Paper
Code

Meta-Learning Representations for Continual Learning

6 code implementations • NeurIPS 2019 • Khurram Javed, Martha White

We show that it is possible to learn naturally sparse representations that are more effective for online updating.

Continual Learning Meta-Learning

189

Paper
Code

Two-Timescale Networks for Nonlinear Value Function Approximation

no code implementations • ICLR 2019 • Wesley Chung, Somjit Nath, Ajin Joseph, Martha White

A key component for many reinforcement learning agents is to learn a value function, either for policy evaluation or control.

Q-Learning Vocal Bursts Valence Prediction

Paper
Add Code

Planning with Expectation Models

no code implementations • 2 Apr 2019 • Yi Wan, Zaheer Abbas, Adam White, Martha White, Richard S. Sutton

In particular, we 1) show that planning with an expectation model is equivalent to planning with a distribution model if the state value function is linear in state features, 2) analyze two common parametrization choices for approximating the expectation: linear and non-linear expectation models, 3) propose a sound model-based policy evaluation algorithm and present its convergence results, and 4) empirically demonstrate the effectiveness of the proposed planning algorithm.

Model-based Reinforcement Learning

Paper
Add Code

Accelerating Large Scale Knowledge Distillation via Dynamic Importance Sampling

no code implementations • 3 Dec 2018 • Minghan Li, Tanli Zuo, Ruicheng Li, Martha White, Wei-Shi Zheng

Knowledge distillation is an effective technique that transfers knowledge from a large teacher model to a shallow student.

Knowledge Distillation Machine Translation +2

Paper
Add Code

Supervised autoencoders: Improving generalization performance with unsupervised regularizers

1 code implementation • NeurIPS 2018 • Lei Le, Andrew Patterson, Martha White

A common strategy to improve generalization has been through the use of regularizers, typically as a norm constraining the parameters.

Paper
Code

An Off-policy Policy Gradient Theorem Using Emphatic Weightings

no code implementations • NeurIPS 2018 • Ehsan Imani, Eric Graves, Martha White

There have been a host of theoretically sound algorithms proposed for the on-policy setting, due to the existence of the policy gradient theorem which provides a simplified form for the gradient.

Policy Gradient Methods

Paper
Add Code

The Barbados 2018 List of Open Issues in Continual Learning

no code implementations • 16 Nov 2018 • Tom Schaul, Hado van Hasselt, Joseph Modayil, Martha White, Adam White, Pierre-Luc Bacon, Jean Harb, Shibl Mourad, Marc Bellemare, Doina Precup

We want to make progress toward artificial general intelligence, namely general-purpose agents that autonomously learn how to competently act in complex environments.

Continual Learning

Paper
Add Code

Context-Dependent Upper-Confidence Bounds for Directed Exploration

no code implementations • NeurIPS 2018 • Raksha Kumaraswamy, Matthew Schlegel, Adam White, Martha White

Directed exploration strategies for reinforcement learning are critical for learning an optimal policy in a minimal number of interactions with the environment.

Efficient Exploration

Paper
Add Code

The Utility of Sparse Representations for Control in Reinforcement Learning

no code implementations • 15 Nov 2018 • Vincent Liu, Raksha Kumaraswamy, Lei Le, Martha White

We investigate sparse representations for control in reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Online Off-policy Prediction

no code implementations • 6 Nov 2018 • Sina Ghiassian, Andrew Patterson, Martha White, Richard S. Sutton, Adam White

The ability to learn behavior-contingent predictions online and off-policy has long been advocated as a key capability of predictive-knowledge learning systems but remained an open algorithmic challenge for decades.

Paper
Add Code

Greedy Actor-Critic: A New Conditional Cross-Entropy Method for Policy Improvement

1 code implementation • 22 Oct 2018 • Samuel Neumann, Sungsu Lim, Ajin Joseph, Yangchen Pan, Adam White, Martha White

We first provide a policy improvement result in an idealized setting, and then prove that our conditional CEM (CCEM) strategy tracks a CEM update per state, even with changing action-values.

Policy Gradient Methods Q-Learning

Paper
Code

Importance Resampling for Off-policy Policy Evaluation

no code implementations • 27 Sep 2018 • Matthew Schlegel, Wesley Chung, Daniel Graves, Martha White

We propose Importance Resampling (IR) for off-policy learning, that resamples experience from the replay buffer and applies a standard on-policy update.

Paper
Add Code

High-confidence error estimates for learned value functions

no code implementations • 28 Aug 2018 • Touqir Sajed, Wesley Chung, Martha White

We provide experiments investigating the number of samples required by this offline algorithm in simple benchmark reinforcement learning domains, and highlight that there are still many open questions to be solved for this important problem.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

General Value Function Networks

no code implementations • 18 Jul 2018 • Matthew Schlegel, Andrew Jacobsen, Zaheer Abbas, Andrew Patterson, Adam White, Martha White

A general purpose strategy for state construction is to learn the state update using a Recurrent Neural Network (RNN), which updates the internal state using the current internal state and the most recent observation.

Continuous Control Decision Making

Paper
Add Code

Reinforcement Learning with Function-Valued Action Spaces for Partial Differential Equation Control

no code implementations • ICML 2018 • Yangchen Pan, Amir-Massoud Farahmand, Martha White, Saleh Nabi, Piyush Grover, Daniel Nikovski

Recent work has shown that reinforcement learning (RL) is a promising approach to control dynamical systems described by partial differential equations (PDE).

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Organizing Experience: A Deeper Look at Replay Mechanisms for Sample-based Planning in Continuous State Domains

no code implementations • 12 Jun 2018 • Yangchen Pan, Muhammad Zaheer, Adam White, Andrew Patterson, Martha White

We show that a model, as opposed to a replay buffer, is particularly useful for specifying which states to sample from during planning, such as predecessor states that propagate information in reverse from a state more quickly.

Paper
Add Code

Improving Regression Performance with Distributional Losses

no code implementations • ICML 2018 • Ehsan Imani, Martha White

We provide theoretical support for this alternative hypothesis, by characterizing the norm of the gradients of this loss.

regression

Paper
Add Code

Directly Estimating the Variance of the λ-Return Using Temporal-Difference Methods

no code implementations • 25 Jan 2018 • Craig Sherstan, Brendan Bennett, Kenny Young, Dylan R. Ashley, Adam White, Martha White, Richard S. Sutton

This paper investigates estimating the variance of a temporal-difference learning agent's update target.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Discovery of Predictive Representations With a Network of General Value Functions

no code implementations • ICLR 2018 • Matthew Schlegel, Andrew Patterson, Adam White, Martha White

We investigate a framework for discovery: curating a large collection of predictions, which are used to construct the agent's representation of the world.

Decision Making

Paper
Add Code

Multi-view Matrix Factorization for Linear Dynamical System Estimation

no code implementations • NeurIPS 2017 • Mahdi Karami, Martha White, Dale Schuurmans, Csaba Szepesvari

In this paper, we instead reconsider likelihood maximization and develop an optimization based strategy for recovering the latent states and transition parameters.

Paper
Add Code

Effective sketching methods for value function approximation

no code implementations • 3 Aug 2017 • Yangchen Pan, Erfan Sadeqi Azer, Martha White

As a remedy, we demonstrate how to use sketching more sparingly, with only a left-sided sketch, that can still enable significant computational gains and the use of these matrix-based learning algorithms that are less sensitive to parameters.

Reinforcement Learning (RL)

Paper
Add Code

Adapting Kernel Representations Online Using Submodular Maximization

no code implementations • ICML 2017 • Matthew Schlegel, Yangchen Pan, Jiecao Chen, Martha White

In this work, we develop an approximately submodular criterion for this setting, and an efficient online greedy submodular maximization algorithm for optimizing the criterion.

Continual Learning

Paper
Add Code

Learning Sparse Representations in Reinforcement Learning with Sparse Coding

no code implementations • 26 Jul 2017 • Lei Le, Raksha Kumaraswamy, Martha White

Outside of reinforcement learning, sparse coding representations have been widely used, with non-convex objectives that result in discriminative representations.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Recovering True Classifier Performance in Positive-Unlabeled Learning

no code implementations • 2 Feb 2017 • Shantanu Jain, Martha White, Predrag Radivojac

A common approach in positive-unlabeled learning is to train a classification model between labeled and unlabeled data.

Paper
Add Code

Accelerated Gradient Temporal Difference Learning

no code implementations • 28 Nov 2016 • Yangchen Pan, Adam White, Martha White

The family of temporal difference (TD) methods span a spectrum from computationally frugal linear methods like TD({\lambda}) to data efficient least squares methods.

Paper
Add Code

Unifying task specification in reinforcement learning

no code implementations • ICML 2017 • Martha White

Reinforcement learning tasks are typically specified as Markov decision processes.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

A Greedy Approach to Adapting the Trace Parameter for Temporal Difference Learning

2 code implementations • 2 Jul 2016 • Martha White, Adam White

One of the main obstacles to broad application of reinforcement learning methods is the parameter sensitivity of our core learning algorithms.

Meta-Learning reinforcement-learning +1

Paper
Code

Estimating the class prior and posterior from noisy positives and unlabeled data

no code implementations • NeurIPS 2016 • Shantanu Jain, Martha White, Predrag Radivojac

We develop a classification algorithm for estimating posterior distributions from positive-unlabeled data, that is robust to noise in the positive labels and effective for high-dimensional data.

Classification Density Estimation +2

Paper
Add Code

Identifying global optimality for dictionary learning

no code implementations • 17 Apr 2016 • Lei Le, Martha White

We then provide an empirical investigation into practical optimization choices for using alternating minimization for induced DLMs, for both batch and stochastic gradient descent.

Dictionary Learning Matrix Completion

Paper
Add Code

Investigating practical linear temporal difference learning

1 code implementation • 28 Feb 2016 • Adam White, Martha White

First, we derive two new hybrid TD policy-evaluation algorithms, which fill a gap in this collection of algorithms.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Nonparametric semi-supervised learning of class proportions

1 code implementation • 8 Jan 2016 • Shantanu Jain, Martha White, Michael W. Trosset, Predrag Radivojac

This problem can be decomposed into two steps: (i) the development of accurate predictors that discriminate between positive and unlabeled data, and (ii) the accurate estimation of the prior probabilities of positive and negative examples.

Density Estimation

Paper
Code

Incremental Truncated LSTD

no code implementations • 26 Nov 2015 • Clement Gehring, Yangchen Pan, Martha White

Balancing between computational efficiency and sample efficiency is an important goal in reinforcement learning.

Computational Efficiency

Paper
Add Code

Emphatic Temporal-Difference Learning

no code implementations • 6 Jul 2015 • A. Rupam Mahmood, Huizhen Yu, Martha White, Richard S. Sutton

Emphatic algorithms are temporal-difference learning algorithms that change their effective state distribution by selectively emphasizing and de-emphasizing their updates on different time steps.

Paper
Add Code

An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning

no code implementations • 14 Mar 2015 • Richard S. Sutton, A. Rupam Mahmood, Martha White

In this paper we introduce the idea of improving the performance of parametric temporal-difference (TD) learning algorithms by selectively emphasizing or de-emphasizing their updates on different time steps.

Paper
Add Code

Convex Multi-view Subspace Learning

no code implementations • NeurIPS 2012 • Martha White, Xinhua Zhang, Dale Schuurmans, Yao-Liang Yu

Subspace learning seeks a low dimensional representation of data that enables accurate reconstruction.

Paper
Add Code

Off-Policy Actor-Critic

1 code implementation • 22 May 2012 • Thomas Degris, Martha White, Richard S. Sutton

Previous work on actor-critic algorithms is limited to the on-policy setting and does not take advantage of the recent advances in off-policy gradient temporal-difference learning.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Interval Estimation for Reinforcement-Learning Algorithms in Continuous-State Domains

no code implementations • NeurIPS 2010 • Martha White, Adam White

The reinforcement learning community has explored many approaches to obtain- ing value estimates and models to guide decision making; these approaches, how- ever, do not usually provide a measure of confidence in the estimate.

Decision Making reinforcement-learning +1

Paper
Add Code

Relaxed Clipping: A Global Training Method for Robust Regression and Classification

no code implementations • NeurIPS 2010 • Min Yang, Linli Xu, Martha White, Dale Schuurmans, Yao-Liang Yu

We present a generic procedure that can be applied to standard loss functions and demonstrate improved robustness in regression and classification problems.

Classification General Classification +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.