Search Results for author: Martha White

Found 63 papers, 12 papers with code

Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences

no code implementations17 Jul 2021 Alan Chan, Hugo Silva, Sungsu Lim, Tadashi Kozuno, A. Rupam Mahmood, Martha White

Approximate Policy Iteration (API) algorithms alternate between (approximate) policy evaluation and (approximate) greedification.

Policy Gradient Methods

Predictive Representation Learning for Language Modeling

no code implementations29 May 2021 Qingfeng Lan, Luke Kumar, Martha White, Alona Fyshe

Correlates of secondary information appear in LSTM representations even though they are not part of an \emph{explicitly} supervised prediction task.

Language Modelling Representation Learning

A Generalized Projected Bellman Error for Off-policy Value Estimation in Reinforcement Learning

no code implementations28 Apr 2021 Andrew Patterson, Adam White, Sina Ghiassian, Martha White

Many algorithms have been developed for off-policy value estimation which are sound under linear function approximation, based on the linear mean-squared projected Bellman error (PBE).

Scalable Online Recurrent Learning Using Columnar Neural Networks

1 code implementation9 Mar 2021 Khurram Javed, Martha White, Rich Sutton

We empirically show that as long as connections between columns are sparse, our method approximates the true gradient well.


Measuring and mitigating interference in reinforcement learning

no code implementations1 Jan 2021 Vincent Liu, Adam M White, Hengshuai Yao, Martha White

Catastrophic interference is common in many network-based learning systems, and many proposals exist for mitigating it.

Representation Learning

Perspectives on Sim2Real Transfer for Robotics: A Summary of the R:SS 2020 Workshop

no code implementations7 Dec 2020 Sebastian Höfer, Kostas Bekris, Ankur Handa, Juan Camilo Gamboa, Florian Golemo, Melissa Mozifian, Chris Atkeson, Dieter Fox, Ken Goldberg, John Leonard, C. Karen Liu, Jan Peters, Shuran Song, Peter Welinder, Martha White

This report presents the debates, posters, and discussions of the Sim2Real workshop held in conjunction with the 2020 edition of the "Robotics: Science and System" conference.

Beyond Prioritized Replay: Sampling States in Model-Based Reinforcement Learning via Simulated Priorities

no code implementations19 Jul 2020 Jincheng Mei, Yangchen Pan, Amir-Massoud Farahmand, Hengshuai Yao, Martha White

The prioritized Experience Replay (ER) method has attracted great attention; however, there is little theoretical understanding about why it can help and its limitations.

Autonomous Driving Continuous Control +1

Towards a practical measure of interference for reinforcement learning

no code implementations7 Jul 2020 Vincent Liu, Adam White, Hengshuai Yao, Martha White

In this work, we provide a definition of interference for control in reinforcement learning.

Selective Dyna-style Planning Under Limited Model Capacity

no code implementations ICML 2020 Zaheer Abbas, Samuel Sokota, Erin J. Talvitie, Martha White

We show that heteroscedastic regression can signal predictive uncertainty arising from model inadequacy that is complementary to that which is detected by methods designed for parameter uncertainty, indicating that considering both parameter uncertainty and model inadequacy may be a more promising direction for effective selective planning than either in isolation.

Model-based Reinforcement Learning

Gradient Temporal-Difference Learning with Regularized Corrections

1 code implementation ICML 2020 Sina Ghiassian, Andrew Patterson, Shivam Garg, Dhawal Gupta, Adam White, Martha White

It is still common to use Q-learning and temporal difference (TD) learning-even though they have divergence issues and sound Gradient TD alternatives exist-because divergence seems rare and they typically perform well.


Learning Causal Models Online

1 code implementation12 Jun 2020 Khurram Javed, Martha White, Yoshua Bengio

One solution for achieving strong generalization is to incorporate causal structures in the models; such structures constrain learning by ignoring correlations that contradict them.

Continual Learning

Hallucinating Value: A Pitfall of Dyna-style Planning with Imperfect Environment Models

no code implementations8 Jun 2020 Taher Jafferjee, Ehsan Imani, Erin Talvitie, Martha White, Micheal Bowling

Dyna-style reinforcement learning (RL) agents improve sample efficiency over model-free RL agents by updating the value function with simulated experience generated by an environment model.

Optimizing for the Future in Non-Stationary MDPs

1 code implementation ICML 2020 Yash Chandak, Georgios Theocharous, Shiv Shankar, Martha White, Sridhar Mahadevan, Philip S. Thomas

Most reinforcement learning methods are based upon the key assumption that the transition dynamics and reward functions are fixed, that is, the underlying Markov decision process is stationary.

Maximizing Information Gain in Partially Observable Environments via Prediction Reward

no code implementations11 May 2020 Yash Satsangi, Sungsu Lim, Shimon Whiteson, Frans Oliehoek, Martha White

Information gathering in a partially observable environment can be formulated as a reinforcement learning (RL), problem where the reward depends on the agent's uncertainty.

Question Answering

Training Recurrent Neural Networks Online by Learning Explicit State Variables

no code implementations ICLR 2020 Somjit Nath, Vincent Liu, Alan Chan, Xin Li, Adam White, Martha White

Recurrent neural networks (RNNs) allow an agent to construct a state-representation from a stream of experience, which is essential in partially observable problems.

Maxmin Q-learning: Controlling the Estimation Bias of Q-learning

1 code implementation ICLR 2020 Qingfeng Lan, Yangchen Pan, Alona Fyshe, Martha White

Q-learning suffers from overestimation bias, because it approximates the maximum action value using the maximum estimated action value.


An implicit function learning approach for parametric modal regression

no code implementations NeurIPS 2020 Yangchen Pan, Ehsan Imani, Martha White, Amir-Massoud Farahmand

We empirically demonstrate on several synthetic problems that our method (i) can learn multi-valued functions and produce the conditional modes, (ii) scales well to high-dimensional inputs, and (iii) can even be more effective for certain uni-modal problems, particularly for high-frequency functions.

Learning Macroscopic Brain Connectomes via Group-Sparse Factorization

1 code implementation NeurIPS 2019 Farzane Aminmansour, Andrew Patterson, Lei Le, Yisu Peng, Daniel Mitchell, Franco Pestilli, Cesar F. Caiafa, Russell Greiner, Martha White

We develop an efficient optimization strategy for this extremely high-dimensional sparse problem, by reducing the number of parameters using a greedy algorithm designed specifically for the problem.

Fuzzy Tiling Activations: A Simple Approach to Learning Sparse Representations Online

no code implementations ICLR 2021 Yangchen Pan, Kirby Banman, Martha White

Recent work has shown that sparse representations -- where only a small percentage of units are active -- can significantly reduce interference.

Continual Learning Continuous Control +2

Is Fast Adaptation All You Need?

no code implementations3 Oct 2019 Khurram Javed, Hengshuai Yao, Martha White

Gradient-based meta-learning has proven to be highly effective at learning model initializations, representations, and update rules that allow fast adaptation from a few samples.

Incremental Learning Meta-Learning

Meta-descent for Online, Continual Prediction

no code implementations17 Jul 2019 Andrew Jacobsen, Matthew Schlegel, Cameron Linke, Thomas Degris, Adam White, Martha White

This paper investigates different vector step-size adaptation approaches for non-stationary online, continual prediction problems.

Time Series Time Series Prediction

Adapting Behaviour via Intrinsic Reward: A Survey and Empirical Study

no code implementations19 Jun 2019 Cam Linke, Nadia M. Ady, Martha White, Thomas Degris, Adam White

The question we tackle in this paper is how to sculpt the stream of experience---how to adapt the learning system's behavior---to optimize the learning of a collection of value functions.

Active Learning Representation Learning

Hill Climbing on Value Estimates for Search-control in Dyna

no code implementations18 Jun 2019 Yangchen Pan, Hengshuai Yao, Amir-Massoud Farahmand, Martha White

In this work, we propose to generate such states by using the trajectory obtained from Hill Climbing (HC) the current estimate of the value function.

Model-based Reinforcement Learning

Importance Resampling for Off-policy Prediction

2 code implementations NeurIPS 2019 Matthew Schlegel, Wesley Chung, Daniel Graves, Jian Qian, Martha White

Importance sampling (IS) is a common reweighting strategy for off-policy prediction in reinforcement learning.

Meta-Learning Representations for Continual Learning

5 code implementations NeurIPS 2019 Khurram Javed, Martha White

We show that it is possible to learn naturally sparse representations that are more effective for online updating.

Continual Learning Meta-Learning

Two-Timescale Networks for Nonlinear Value Function Approximation

no code implementations ICLR 2019 Wesley Chung, Somjit Nath, Ajin Joseph, Martha White

A key component for many reinforcement learning agents is to learn a value function, either for policy evaluation or control.


Planning with Expectation Models

no code implementations2 Apr 2019 Yi Wan, Zaheer Abbas, Adam White, Martha White, Richard S. Sutton

In particular, we 1) show that planning with an expectation model is equivalent to planning with a distribution model if the state value function is linear in state features, 2) analyze two common parametrization choices for approximating the expectation: linear and non-linear expectation models, 3) propose a sound model-based policy evaluation algorithm and present its convergence results, and 4) empirically demonstrate the effectiveness of the proposed planning algorithm.

Model-based Reinforcement Learning

Accelerating Large Scale Knowledge Distillation via Dynamic Importance Sampling

no code implementations3 Dec 2018 Minghan Li, Tanli Zuo, Ruicheng Li, Martha White, Wei-Shi Zheng

Knowledge distillation is an effective technique that transfers knowledge from a large teacher model to a shallow student.

Knowledge Distillation Machine Translation +2

Supervised autoencoders: Improving generalization performance with unsupervised regularizers

1 code implementation NeurIPS 2018 Lei Le, Andrew Patterson, Martha White

A common strategy to improve generalization has been through the use of regularizers, typically as a norm constraining the parameters.

An Off-policy Policy Gradient Theorem Using Emphatic Weightings

no code implementations NeurIPS 2018 Ehsan Imani, Eric Graves, Martha White

There have been a host of theoretically sound algorithms proposed for the on-policy setting, due to the existence of the policy gradient theorem which provides a simplified form for the gradient.

Policy Gradient Methods

The Barbados 2018 List of Open Issues in Continual Learning

no code implementations16 Nov 2018 Tom Schaul, Hado van Hasselt, Joseph Modayil, Martha White, Adam White, Pierre-Luc Bacon, Jean Harb, Shibl Mourad, Marc Bellemare, Doina Precup

We want to make progress toward artificial general intelligence, namely general-purpose agents that autonomously learn how to competently act in complex environments.

Continual Learning

Context-Dependent Upper-Confidence Bounds for Directed Exploration

no code implementations NeurIPS 2018 Raksha Kumaraswamy, Matthew Schlegel, Adam White, Martha White

Directed exploration strategies for reinforcement learning are critical for learning an optimal policy in a minimal number of interactions with the environment.

Efficient Exploration

Online Off-policy Prediction

no code implementations6 Nov 2018 Sina Ghiassian, Andrew Patterson, Martha White, Richard S. Sutton, Adam White

The ability to learn behavior-contingent predictions online and off-policy has long been advocated as a key capability of predictive-knowledge learning systems but remained an open algorithmic challenge for decades.

Actor-Expert: A Framework for using Q-learning in Continuous Action Spaces

no code implementations22 Oct 2018 Sungsu Lim, Ajin Joseph, Lei Le, Yangchen Pan, Martha White

A common strategy has been to restrict the functional form of the action-values to be concave in the actions, to simplify the optimization.

Global Optimization Q-Learning

High-confidence error estimates for learned value functions

no code implementations28 Aug 2018 Touqir Sajed, Wesley Chung, Martha White

We provide experiments investigating the number of samples required by this offline algorithm in simple benchmark reinforcement learning domains, and highlight that there are still many open questions to be solved for this important problem.

General Value Function Networks

no code implementations18 Jul 2018 Matthew Schlegel, Andrew Jacobsen, Zaheer Abbas, Andrew Patterson, Adam White, Martha White

A general purpose strategy for state construction is to learn the state update using a Recurrent Neural Network (RNN), which updates the internal state using the current internal state and the most recent observation.

Continuous Control Decision Making

Reinforcement Learning with Function-Valued Action Spaces for Partial Differential Equation Control

no code implementations ICML 2018 Yangchen Pan, Amir-Massoud Farahmand, Martha White, Saleh Nabi, Piyush Grover, Daniel Nikovski

Recent work has shown that reinforcement learning (RL) is a promising approach to control dynamical systems described by partial differential equations (PDE).

Organizing Experience: A Deeper Look at Replay Mechanisms for Sample-based Planning in Continuous State Domains

no code implementations12 Jun 2018 Yangchen Pan, Muhammad Zaheer, Adam White, Andrew Patterson, Martha White

We show that a model, as opposed to a replay buffer, is particularly useful for specifying which states to sample from during planning, such as predecessor states that propagate information in reverse from a state more quickly.

Improving Regression Performance with Distributional Losses

no code implementations ICML 2018 Ehsan Imani, Martha White

We provide theoretical support for this alternative hypothesis, by characterizing the norm of the gradients of this loss.

Discovery of Predictive Representations With a Network of General Value Functions

no code implementations ICLR 2018 Matthew Schlegel, Andrew Patterson, Adam White, Martha White

We investigate a framework for discovery: curating a large collection of predictions, which are used to construct the agent's representation of the world.

Decision Making

Multi-view Matrix Factorization for Linear Dynamical System Estimation

no code implementations NeurIPS 2017 Mahdi Karami, Martha White, Dale Schuurmans, Csaba Szepesvari

In this paper, we instead reconsider likelihood maximization and develop an optimization based strategy for recovering the latent states and transition parameters.

Global Optimization

Effective sketching methods for value function approximation

no code implementations3 Aug 2017 Yangchen Pan, Erfan Sadeqi Azer, Martha White

As a remedy, we demonstrate how to use sketching more sparingly, with only a left-sided sketch, that can still enable significant computational gains and the use of these matrix-based learning algorithms that are less sensitive to parameters.

Adapting Kernel Representations Online Using Submodular Maximization

no code implementations ICML 2017 Matthew Schlegel, Yangchen Pan, Jiecao Chen, Martha White

In this work, we develop an approximately submodular criterion for this setting, and an efficient online greedy submodular maximization algorithm for optimizing the criterion.

Continual Learning

Learning Sparse Representations in Reinforcement Learning with Sparse Coding

no code implementations26 Jul 2017 Lei Le, Raksha Kumaraswamy, Martha White

Outside of reinforcement learning, sparse coding representations have been widely used, with non-convex objectives that result in discriminative representations.

Representation Learning

Recovering True Classifier Performance in Positive-Unlabeled Learning

no code implementations2 Feb 2017 Shantanu Jain, Martha White, Predrag Radivojac

A common approach in positive-unlabeled learning is to train a classification model between labeled and unlabeled data.

Accelerated Gradient Temporal Difference Learning

no code implementations28 Nov 2016 Yangchen Pan, Adam White, Martha White

The family of temporal difference (TD) methods span a spectrum from computationally frugal linear methods like TD({\lambda}) to data efficient least squares methods.

Unifying task specification in reinforcement learning

no code implementations ICML 2017 Martha White

Reinforcement learning tasks are typically specified as Markov decision processes.

A Greedy Approach to Adapting the Trace Parameter for Temporal Difference Learning

2 code implementations2 Jul 2016 Martha White, Adam White

One of the main obstacles to broad application of reinforcement learning methods is the parameter sensitivity of our core learning algorithms.


Estimating the class prior and posterior from noisy positives and unlabeled data

no code implementations NeurIPS 2016 Shantanu Jain, Martha White, Predrag Radivojac

We develop a classification algorithm for estimating posterior distributions from positive-unlabeled data, that is robust to noise in the positive labels and effective for high-dimensional data.

Classification Density Estimation +2

Identifying global optimality for dictionary learning

no code implementations17 Apr 2016 Lei Le, Martha White

We then provide an empirical investigation into practical optimization choices for using alternating minimization for induced DLMs, for both batch and stochastic gradient descent.

Dictionary Learning Global Optimization +1

Investigating practical linear temporal difference learning

1 code implementation28 Feb 2016 Adam White, Martha White

First, we derive two new hybrid TD policy-evaluation algorithms, which fill a gap in this collection of algorithms.

Nonparametric semi-supervised learning of class proportions

no code implementations8 Jan 2016 Shantanu Jain, Martha White, Michael W. Trosset, Predrag Radivojac

This problem can be decomposed into two steps: (i) the development of accurate predictors that discriminate between positive and unlabeled data, and (ii) the accurate estimation of the prior probabilities of positive and negative examples.

Density Estimation

Incremental Truncated LSTD

no code implementations26 Nov 2015 Clement Gehring, Yangchen Pan, Martha White

Balancing between computational efficiency and sample efficiency is an important goal in reinforcement learning.

Emphatic Temporal-Difference Learning

no code implementations6 Jul 2015 A. Rupam Mahmood, Huizhen Yu, Martha White, Richard S. Sutton

Emphatic algorithms are temporal-difference learning algorithms that change their effective state distribution by selectively emphasizing and de-emphasizing their updates on different time steps.

An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning

no code implementations14 Mar 2015 Richard S. Sutton, A. Rupam Mahmood, Martha White

In this paper we introduce the idea of improving the performance of parametric temporal-difference (TD) learning algorithms by selectively emphasizing or de-emphasizing their updates on different time steps.

Convex Multi-view Subspace Learning

no code implementations NeurIPS 2012 Martha White, Xinhua Zhang, Dale Schuurmans, Yao-Liang Yu

Subspace learning seeks a low dimensional representation of data that enables accurate reconstruction.

Off-Policy Actor-Critic

no code implementations22 May 2012 Thomas Degris, Martha White, Richard S. Sutton

Previous work on actor-critic algorithms is limited to the on-policy setting and does not take advantage of the recent advances in off-policy gradient temporal-difference learning.

Relaxed Clipping: A Global Training Method for Robust Regression and Classification

no code implementations NeurIPS 2010 Min Yang, Linli Xu, Martha White, Dale Schuurmans, Yao-Liang Yu

We present a generic procedure that can be applied to standard loss functions and demonstrate improved robustness in regression and classification problems.

Classification General Classification

Interval Estimation for Reinforcement-Learning Algorithms in Continuous-State Domains

no code implementations NeurIPS 2010 Martha White, Adam White

The reinforcement learning community has explored many approaches to obtain- ing value estimates and models to guide decision making; these approaches, how- ever, do not usually provide a measure of confidence in the estimate.

Decision Making

Cannot find the paper you are looking for? You can Submit a new open access paper.