Search Results for author: Martha White

Found 92 papers, 27 papers with code

Tuning for the Unknown: Revisiting Evaluation Strategies for Lifelong RL

no code implementations2 Apr 2024 Golnaz Mesbahi, Olya Mastikhina, Parham Mohammad Panahi, Martha White, Adam White

In this paper we propose a new approach for tuning and evaluating lifelong RL agents where only one percent of the experiment data can be used for hyperparameter tuning.

Investigating the Histogram Loss in Regression

1 code implementation20 Feb 2024 Ehsan Imani, Kai Luedemann, Sam Scholnick-Hughes, Esraa Elelimy, Martha White

It is becoming increasingly common in regression to train neural networks that model the entire distribution even if only the mean is required for prediction.


What to Do When Your Discrete Optimization Is the Size of a Neural Network?

1 code implementation15 Feb 2024 Hugo Silva, Martha White

Oftentimes, machine learning applications using neural networks involve solving discrete optimization problems, such as in pruning, parameter-isolation-based continual learning and training of binary networks.

Continual Learning Image Classification +1

Compound Returns Reduce Variance in Reinforcement Learning

no code implementations6 Feb 2024 Brett Daley, Martha White, Marlos C. Machado

Multistep returns, such as $n$-step returns and $\lambda$-returns, are commonly used to improve the sample efficiency of reinforcement learning (RL) methods.

reinforcement-learning Reinforcement Learning (RL)

GVFs in the Real World: Making Predictions Online for Water Treatment

no code implementations4 Dec 2023 Muhammad Kamran Janjua, Haseeb Shah, Martha White, Erfan Miahi, Marlos C. Machado, Adam White

In this paper we investigate the use of reinforcement-learning based prediction approaches for a real drinking-water treatment plant.

Time Series Prediction

Measuring and Mitigating Interference in Reinforcement Learning

no code implementations10 Jul 2023 Vincent Liu, Han Wang, Ruo Yu Tao, Khurram Javed, Adam White, Martha White

Lastly, we outline a class of algorithms which we call online-aware that are designed to mitigate interference, and show they do reduce interference according to our measure and that they improve stability and performance in several classic control environments.


Coagent Networks: Generalized and Scaled

no code implementations16 May 2023 James E. Kostas, Scott M. Jordan, Yash Chandak, Georgios Theocharous, Dhawal Gupta, Martha White, Bruno Castro da Silva, Philip S. Thomas

However, the coagent framework is not just an alternative to BDL; the two approaches can be blended: BDL can be combined with coagent learning rules to create architectures with the advantages of both approaches.

Reinforcement Learning (RL)

Empirical Design in Reinforcement Learning

no code implementations3 Apr 2023 Andrew Patterson, Samuel Neumann, Martha White, Adam White

The objective of this document is to provide answers on how we can use our unprecedented compute to do good science in reinforcement learning, as well as stay alert to potential pitfalls in our empirical design.


The In-Sample Softmax for Offline Reinforcement Learning

4 code implementations28 Feb 2023 Chenjun Xiao, Han Wang, Yangchen Pan, Adam White, Martha White

We highlight a simple fact: it is more straightforward to approximate an in-sample \emph{softmax} using only actions in the dataset.

Offline RL reinforcement-learning +1

Asymptotically Unbiased Off-Policy Policy Evaluation when Reusing Old Data in Nonstationary Environments

no code implementations23 Feb 2023 Vincent Liu, Yash Chandak, Philip Thomas, Martha White

In this work, we consider the off-policy policy evaluation problem for contextual bandits and finite horizon reinforcement learning in the nonstationary setting.

Multi-Armed Bandits regression +2

Generalized Munchausen Reinforcement Learning using Tsallis KL Divergence

no code implementations27 Jan 2023 Lingwei Zhu, Zheng Chen, Matthew Schlegel, Martha White

Many policy optimization approaches in reinforcement learning incorporate a Kullback-Leilbler (KL) divergence to the previous policy, to prevent the policy from changing too quickly.

Atari Games reinforcement-learning +1

Trajectory-Aware Eligibility Traces for Off-Policy Reinforcement Learning

1 code implementation26 Jan 2023 Brett Daley, Martha White, Christopher Amato, Marlos C. Machado

Off-policy learning from multistep returns is crucial for sample-efficient reinforcement learning, but counteracting off-policy bias without exacerbating variance is challenging.

reinforcement-learning Reinforcement Learning (RL)

Scalable Real-Time Recurrent Learning Using Columnar-Constructive Networks

1 code implementation20 Jan 2023 Khurram Javed, Haseeb Shah, Rich Sutton, Martha White

We show that by either decomposing the network into independent modules or learning the network in stages, we can make RTRL scale linearly with the number of parameters.

Atari Games

Goal-Space Planning with Subgoal Models

no code implementations6 Jun 2022 Chunlok Lo, Kevin Roice, Parham Mohammad Panahi, Scott Jordan, Adam White, Gabor Mihucz, Farzane Aminmansour, Martha White

In this paper, we avoid this limitation by constraining background planning to a set of (abstract) subgoals and learning only local, subgoal-conditioned models.

Model-based Reinforcement Learning Reinforcement Learning (RL)

Robust Losses for Learning Value Functions

no code implementations17 May 2022 Andrew Patterson, Victor Liao, Martha White

We start from a formalization of robust losses, then derive sound gradient-based approaches to minimize these losses in both the online off-policy prediction and control settings.

A Temporal-Difference Approach to Policy Gradient Estimation

1 code implementation4 Feb 2022 Samuele Tosatto, Andrew Patterson, Martha White, A. Rupam Mahmood

The policy gradient theorem (Sutton et al., 2000) prescribes the usage of a cumulative discounted state distribution under the target policy to approximate the gradient.

An Alternate Policy Gradient Estimator for Softmax Policies

1 code implementation22 Dec 2021 Shivam Garg, Samuele Tosatto, Yangchen Pan, Martha White, A. Rupam Mahmood

Policy gradient (PG) estimators are ineffective in dealing with softmax policies that are sub-optimally saturated, which refers to the situation when the policy concentrates its probability mass on sub-optimal actions.

Representation Alignment in Neural Networks

1 code implementation15 Dec 2021 Ehsan Imani, Wei Hu, Martha White

We then highlight why alignment between the top singular vectors and the targets can speed up learning and show in a classic synthetic transfer problem that representation alignment correlates with positive and negative transfer to similar and dissimilar tasks.

Off-Policy Actor-Critic with Emphatic Weightings

1 code implementation16 Nov 2021 Eric Graves, Ehsan Imani, Raksha Kumaraswamy, Martha White

A variety of theoretically-sound policy gradient algorithms exist for the on-policy setting due to the policy gradient theorem, which provides a simplified form for the gradient.

Offline-Online Reinforcement Learning: Extending Batch and Online RL

no code implementations29 Sep 2021 Maryam Hashemzadeh, Wesley Chung, Martha White

To enable better performance, we investigate the offline-online setting: The agent has access to a batch of data to train on but is also allowed to learn during the evaluation phase in an online manner.

reinforcement-learning Reinforcement Learning (RL)

Resmax: An Alternative Soft-Greedy Operator for Reinforcement Learning

no code implementations29 Sep 2021 Erfan Miahi, Revan MacQueen, Alex Ayoub, Abbas Masoumzadeh, Martha White

Soft-greedy operators, namely $\varepsilon$-greedy and softmax, remain a common choice to induce a basic level of exploration for action-value methods in reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences

no code implementations17 Jul 2021 Alan Chan, Hugo Silva, Sungsu Lim, Tadashi Kozuno, A. Rupam Mahmood, Martha White

Approximate Policy Iteration (API) algorithms alternate between (approximate) policy evaluation and (approximate) greedification.

Policy Gradient Methods

Predictive Representation Learning for Language Modeling

no code implementations29 May 2021 Qingfeng Lan, Luke Kumar, Martha White, Alona Fyshe

Correlates of secondary information appear in LSTM representations even though they are not part of an \emph{explicitly} supervised prediction task.

Language Modelling Reinforcement Learning (RL) +1

A Generalized Projected Bellman Error for Off-policy Value Estimation in Reinforcement Learning

no code implementations28 Apr 2021 Andrew Patterson, Adam White, Martha White

Many algorithms have been developed for off-policy value estimation based on the linear mean squared projected Bellman error (MSPBE) and are sound under linear function approximation.

reinforcement-learning Reinforcement Learning (RL)

Scalable Online Recurrent Learning Using Columnar Neural Networks

1 code implementation9 Mar 2021 Khurram Javed, Martha White, Rich Sutton

We empirically show that as long as connections between columns are sparse, our method approximates the true gradient well.


Perspectives on Sim2Real Transfer for Robotics: A Summary of the R:SS 2020 Workshop

no code implementations7 Dec 2020 Sebastian Höfer, Kostas Bekris, Ankur Handa, Juan Camilo Gamboa, Florian Golemo, Melissa Mozifian, Chris Atkeson, Dieter Fox, Ken Goldberg, John Leonard, C. Karen Liu, Jan Peters, Shuran Song, Peter Welinder, Martha White

This report presents the debates, posters, and discussions of the Sim2Real workshop held in conjunction with the 2020 edition of the "Robotics: Science and System" conference.

Beyond Prioritized Replay: Sampling States in Model-Based RL via Simulated Priorities

1 code implementation28 Sep 2020 Jincheng Mei, Yangchen Pan, Martha White, Amir-Massoud Farahmand, Hengshuai Yao

The prioritized Experience Replay (ER) method has attracted great attention; however, there is little theoretical understanding of such prioritization strategy and why they help.

Understanding and Mitigating the Limitations of Prioritized Experience Replay

2 code implementations19 Jul 2020 Yangchen Pan, Jincheng Mei, Amir-Massoud Farahmand, Martha White, Hengshuai Yao, Mohsen Rohani, Jun Luo

Prioritized Experience Replay (ER) has been empirically shown to improve sample efficiency across many domains and attracted great attention; however, there is little theoretical understanding of why such prioritized sampling helps and its limitations.

Autonomous Driving Continuous Control +1

Selective Dyna-style Planning Under Limited Model Capacity

no code implementations ICML 2020 Zaheer Abbas, Samuel Sokota, Erin J. Talvitie, Martha White

We show that heteroscedastic regression can signal predictive uncertainty arising from model inadequacy that is complementary to that which is detected by methods designed for parameter uncertainty, indicating that considering both parameter uncertainty and model inadequacy may be a more promising direction for effective selective planning than either in isolation.

Model-based Reinforcement Learning

Gradient Temporal-Difference Learning with Regularized Corrections

1 code implementation ICML 2020 Sina Ghiassian, Andrew Patterson, Shivam Garg, Dhawal Gupta, Adam White, Martha White

It is still common to use Q-learning and temporal difference (TD) learning-even though they have divergence issues and sound Gradient TD alternatives exist-because divergence seems rare and they typically perform well.


Learning Causal Models Online

1 code implementation12 Jun 2020 Khurram Javed, Martha White, Yoshua Bengio

One solution for achieving strong generalization is to incorporate causal structures in the models; such structures constrain learning by ignoring correlations that contradict them.

Continual Learning

Hallucinating Value: A Pitfall of Dyna-style Planning with Imperfect Environment Models

no code implementations8 Jun 2020 Taher Jafferjee, Ehsan Imani, Erin Talvitie, Martha White, Micheal Bowling

Dyna-style reinforcement learning (RL) agents improve sample efficiency over model-free RL agents by updating the value function with simulated experience generated by an environment model.

Reinforcement Learning (RL)

Optimizing for the Future in Non-Stationary MDPs

1 code implementation ICML 2020 Yash Chandak, Georgios Theocharous, Shiv Shankar, Martha White, Sridhar Mahadevan, Philip S. Thomas

Most reinforcement learning methods are based upon the key assumption that the transition dynamics and reward functions are fixed, that is, the underlying Markov decision process is stationary.

Maximizing Information Gain in Partially Observable Environments via Prediction Reward

no code implementations11 May 2020 Yash Satsangi, Sungsu Lim, Shimon Whiteson, Frans Oliehoek, Martha White

Information gathering in a partially observable environment can be formulated as a reinforcement learning (RL), problem where the reward depends on the agent's uncertainty.

Question Answering Reinforcement Learning (RL)

Training Recurrent Neural Networks Online by Learning Explicit State Variables

no code implementations ICLR 2020 Somjit Nath, Vincent Liu, Alan Chan, Xin Li, Adam White, Martha White

Recurrent neural networks (RNNs) allow an agent to construct a state-representation from a stream of experience, which is essential in partially observable problems.

Maxmin Q-learning: Controlling the Estimation Bias of Q-learning

1 code implementation ICLR 2020 Qingfeng Lan, Yangchen Pan, Alona Fyshe, Martha White

Q-learning suffers from overestimation bias, because it approximates the maximum action value using the maximum estimated action value.


An implicit function learning approach for parametric modal regression

no code implementations NeurIPS 2020 Yangchen Pan, Ehsan Imani, Martha White, Amir-Massoud Farahmand

We empirically demonstrate on several synthetic problems that our method (i) can learn multi-valued functions and produce the conditional modes, (ii) scales well to high-dimensional inputs, and (iii) can even be more effective for certain uni-modal problems, particularly for high-frequency functions.


Learning Macroscopic Brain Connectomes via Group-Sparse Factorization

1 code implementation NeurIPS 2019 Farzane Aminmansour, Andrew Patterson, Lei Le, Yisu Peng, Daniel Mitchell, Franco Pestilli, Cesar F. Caiafa, Russell Greiner, Martha White

We develop an efficient optimization strategy for this extremely high-dimensional sparse problem, by reducing the number of parameters using a greedy algorithm designed specifically for the problem.

Fuzzy Tiling Activations: A Simple Approach to Learning Sparse Representations Online

1 code implementation ICLR 2021 Yangchen Pan, Kirby Banman, Martha White

Recent work has shown that sparse representations -- where only a small percentage of units are active -- can significantly reduce interference.

Continual Learning Continuous Control +2

Is Fast Adaptation All You Need?

no code implementations3 Oct 2019 Khurram Javed, Hengshuai Yao, Martha White

Gradient-based meta-learning has proven to be highly effective at learning model initializations, representations, and update rules that allow fast adaptation from a few samples.

Incremental Learning Meta-Learning

Meta-descent for Online, Continual Prediction

no code implementations17 Jul 2019 Andrew Jacobsen, Matthew Schlegel, Cameron Linke, Thomas Degris, Adam White, Martha White

This paper investigates different vector step-size adaptation approaches for non-stationary online, continual prediction problems.

Second-order methods Time Series +1

Adapting Behaviour via Intrinsic Reward: A Survey and Empirical Study

no code implementations19 Jun 2019 Cam Linke, Nadia M. Ady, Martha White, Thomas Degris, Adam White

The question we tackle in this paper is how to sculpt the stream of experience---how to adapt the learning system's behavior---to optimize the learning of a collection of value functions.

Active Learning reinforcement-learning +2

Hill Climbing on Value Estimates for Search-control in Dyna

no code implementations18 Jun 2019 Yangchen Pan, Hengshuai Yao, Amir-Massoud Farahmand, Martha White

In this work, we propose to generate such states by using the trajectory obtained from Hill Climbing (HC) the current estimate of the value function.

Model-based Reinforcement Learning Reinforcement Learning (RL)

Importance Resampling for Off-policy Prediction

2 code implementations NeurIPS 2019 Matthew Schlegel, Wesley Chung, Daniel Graves, Jian Qian, Martha White

Importance sampling (IS) is a common reweighting strategy for off-policy prediction in reinforcement learning.

Meta-Learning Representations for Continual Learning

6 code implementations NeurIPS 2019 Khurram Javed, Martha White

We show that it is possible to learn naturally sparse representations that are more effective for online updating.

Continual Learning Meta-Learning

Two-Timescale Networks for Nonlinear Value Function Approximation

no code implementations ICLR 2019 Wesley Chung, Somjit Nath, Ajin Joseph, Martha White

A key component for many reinforcement learning agents is to learn a value function, either for policy evaluation or control.

Q-Learning Vocal Bursts Valence Prediction

Planning with Expectation Models

no code implementations2 Apr 2019 Yi Wan, Zaheer Abbas, Adam White, Martha White, Richard S. Sutton

In particular, we 1) show that planning with an expectation model is equivalent to planning with a distribution model if the state value function is linear in state features, 2) analyze two common parametrization choices for approximating the expectation: linear and non-linear expectation models, 3) propose a sound model-based policy evaluation algorithm and present its convergence results, and 4) empirically demonstrate the effectiveness of the proposed planning algorithm.

Model-based Reinforcement Learning

Accelerating Large Scale Knowledge Distillation via Dynamic Importance Sampling

no code implementations3 Dec 2018 Minghan Li, Tanli Zuo, Ruicheng Li, Martha White, Wei-Shi Zheng

Knowledge distillation is an effective technique that transfers knowledge from a large teacher model to a shallow student.

Knowledge Distillation Machine Translation +2

Supervised autoencoders: Improving generalization performance with unsupervised regularizers

1 code implementation NeurIPS 2018 Lei Le, Andrew Patterson, Martha White

A common strategy to improve generalization has been through the use of regularizers, typically as a norm constraining the parameters.

An Off-policy Policy Gradient Theorem Using Emphatic Weightings

no code implementations NeurIPS 2018 Ehsan Imani, Eric Graves, Martha White

There have been a host of theoretically sound algorithms proposed for the on-policy setting, due to the existence of the policy gradient theorem which provides a simplified form for the gradient.

Policy Gradient Methods

The Barbados 2018 List of Open Issues in Continual Learning

no code implementations16 Nov 2018 Tom Schaul, Hado van Hasselt, Joseph Modayil, Martha White, Adam White, Pierre-Luc Bacon, Jean Harb, Shibl Mourad, Marc Bellemare, Doina Precup

We want to make progress toward artificial general intelligence, namely general-purpose agents that autonomously learn how to competently act in complex environments.

Continual Learning

Context-Dependent Upper-Confidence Bounds for Directed Exploration

no code implementations NeurIPS 2018 Raksha Kumaraswamy, Matthew Schlegel, Adam White, Martha White

Directed exploration strategies for reinforcement learning are critical for learning an optimal policy in a minimal number of interactions with the environment.

Efficient Exploration

Online Off-policy Prediction

no code implementations6 Nov 2018 Sina Ghiassian, Andrew Patterson, Martha White, Richard S. Sutton, Adam White

The ability to learn behavior-contingent predictions online and off-policy has long been advocated as a key capability of predictive-knowledge learning systems but remained an open algorithmic challenge for decades.

Greedy Actor-Critic: A New Conditional Cross-Entropy Method for Policy Improvement

1 code implementation22 Oct 2018 Samuel Neumann, Sungsu Lim, Ajin Joseph, Yangchen Pan, Adam White, Martha White

We first provide a policy improvement result in an idealized setting, and then prove that our conditional CEM (CCEM) strategy tracks a CEM update per state, even with changing action-values.

Policy Gradient Methods Q-Learning

Importance Resampling for Off-policy Policy Evaluation

no code implementations27 Sep 2018 Matthew Schlegel, Wesley Chung, Daniel Graves, Martha White

We propose Importance Resampling (IR) for off-policy learning, that resamples experience from the replay buffer and applies a standard on-policy update.

High-confidence error estimates for learned value functions

no code implementations28 Aug 2018 Touqir Sajed, Wesley Chung, Martha White

We provide experiments investigating the number of samples required by this offline algorithm in simple benchmark reinforcement learning domains, and highlight that there are still many open questions to be solved for this important problem.

reinforcement-learning Reinforcement Learning (RL) +1

General Value Function Networks

no code implementations18 Jul 2018 Matthew Schlegel, Andrew Jacobsen, Zaheer Abbas, Andrew Patterson, Adam White, Martha White

A general purpose strategy for state construction is to learn the state update using a Recurrent Neural Network (RNN), which updates the internal state using the current internal state and the most recent observation.

Continuous Control Decision Making

Organizing Experience: A Deeper Look at Replay Mechanisms for Sample-based Planning in Continuous State Domains

no code implementations12 Jun 2018 Yangchen Pan, Muhammad Zaheer, Adam White, Andrew Patterson, Martha White

We show that a model, as opposed to a replay buffer, is particularly useful for specifying which states to sample from during planning, such as predecessor states that propagate information in reverse from a state more quickly.

Improving Regression Performance with Distributional Losses

no code implementations ICML 2018 Ehsan Imani, Martha White

We provide theoretical support for this alternative hypothesis, by characterizing the norm of the gradients of this loss.


Discovery of Predictive Representations With a Network of General Value Functions

no code implementations ICLR 2018 Matthew Schlegel, Andrew Patterson, Adam White, Martha White

We investigate a framework for discovery: curating a large collection of predictions, which are used to construct the agent's representation of the world.

Decision Making

Multi-view Matrix Factorization for Linear Dynamical System Estimation

no code implementations NeurIPS 2017 Mahdi Karami, Martha White, Dale Schuurmans, Csaba Szepesvari

In this paper, we instead reconsider likelihood maximization and develop an optimization based strategy for recovering the latent states and transition parameters.

Effective sketching methods for value function approximation

no code implementations3 Aug 2017 Yangchen Pan, Erfan Sadeqi Azer, Martha White

As a remedy, we demonstrate how to use sketching more sparingly, with only a left-sided sketch, that can still enable significant computational gains and the use of these matrix-based learning algorithms that are less sensitive to parameters.

Reinforcement Learning (RL)

Adapting Kernel Representations Online Using Submodular Maximization

no code implementations ICML 2017 Matthew Schlegel, Yangchen Pan, Jiecao Chen, Martha White

In this work, we develop an approximately submodular criterion for this setting, and an efficient online greedy submodular maximization algorithm for optimizing the criterion.

Continual Learning

Learning Sparse Representations in Reinforcement Learning with Sparse Coding

no code implementations26 Jul 2017 Lei Le, Raksha Kumaraswamy, Martha White

Outside of reinforcement learning, sparse coding representations have been widely used, with non-convex objectives that result in discriminative representations.

reinforcement-learning Reinforcement Learning (RL) +1

Recovering True Classifier Performance in Positive-Unlabeled Learning

no code implementations2 Feb 2017 Shantanu Jain, Martha White, Predrag Radivojac

A common approach in positive-unlabeled learning is to train a classification model between labeled and unlabeled data.

Accelerated Gradient Temporal Difference Learning

no code implementations28 Nov 2016 Yangchen Pan, Adam White, Martha White

The family of temporal difference (TD) methods span a spectrum from computationally frugal linear methods like TD({\lambda}) to data efficient least squares methods.

A Greedy Approach to Adapting the Trace Parameter for Temporal Difference Learning

2 code implementations2 Jul 2016 Martha White, Adam White

One of the main obstacles to broad application of reinforcement learning methods is the parameter sensitivity of our core learning algorithms.

Meta-Learning reinforcement-learning +1

Estimating the class prior and posterior from noisy positives and unlabeled data

no code implementations NeurIPS 2016 Shantanu Jain, Martha White, Predrag Radivojac

We develop a classification algorithm for estimating posterior distributions from positive-unlabeled data, that is robust to noise in the positive labels and effective for high-dimensional data.

Classification Density Estimation +2

Identifying global optimality for dictionary learning

no code implementations17 Apr 2016 Lei Le, Martha White

We then provide an empirical investigation into practical optimization choices for using alternating minimization for induced DLMs, for both batch and stochastic gradient descent.

Dictionary Learning Matrix Completion

Investigating practical linear temporal difference learning

1 code implementation28 Feb 2016 Adam White, Martha White

First, we derive two new hybrid TD policy-evaluation algorithms, which fill a gap in this collection of algorithms.

reinforcement-learning Reinforcement Learning (RL)

Nonparametric semi-supervised learning of class proportions

1 code implementation8 Jan 2016 Shantanu Jain, Martha White, Michael W. Trosset, Predrag Radivojac

This problem can be decomposed into two steps: (i) the development of accurate predictors that discriminate between positive and unlabeled data, and (ii) the accurate estimation of the prior probabilities of positive and negative examples.

Density Estimation

Incremental Truncated LSTD

no code implementations26 Nov 2015 Clement Gehring, Yangchen Pan, Martha White

Balancing between computational efficiency and sample efficiency is an important goal in reinforcement learning.

Computational Efficiency

Emphatic Temporal-Difference Learning

no code implementations6 Jul 2015 A. Rupam Mahmood, Huizhen Yu, Martha White, Richard S. Sutton

Emphatic algorithms are temporal-difference learning algorithms that change their effective state distribution by selectively emphasizing and de-emphasizing their updates on different time steps.

An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning

no code implementations14 Mar 2015 Richard S. Sutton, A. Rupam Mahmood, Martha White

In this paper we introduce the idea of improving the performance of parametric temporal-difference (TD) learning algorithms by selectively emphasizing or de-emphasizing their updates on different time steps.

Convex Multi-view Subspace Learning

no code implementations NeurIPS 2012 Martha White, Xinhua Zhang, Dale Schuurmans, Yao-Liang Yu

Subspace learning seeks a low dimensional representation of data that enables accurate reconstruction.

Off-Policy Actor-Critic

1 code implementation22 May 2012 Thomas Degris, Martha White, Richard S. Sutton

Previous work on actor-critic algorithms is limited to the on-policy setting and does not take advantage of the recent advances in off-policy gradient temporal-difference learning.

reinforcement-learning Reinforcement Learning (RL)

Interval Estimation for Reinforcement-Learning Algorithms in Continuous-State Domains

no code implementations NeurIPS 2010 Martha White, Adam White

The reinforcement learning community has explored many approaches to obtain- ing value estimates and models to guide decision making; these approaches, how- ever, do not usually provide a measure of confidence in the estimate.

Decision Making reinforcement-learning +1

Relaxed Clipping: A Global Training Method for Robust Regression and Classification

no code implementations NeurIPS 2010 Min Yang, Linli Xu, Martha White, Dale Schuurmans, Yao-Liang Yu

We present a generic procedure that can be applied to standard loss functions and demonstrate improved robustness in regression and classification problems.

Classification General Classification +1

Cannot find the paper you are looking for? You can Submit a new open access paper.