1 code implementation • 28 Feb 2023 • Chenjun Xiao, Han Wang, Yangchen Pan, Adam White, Martha White
We highlight a simple fact: it is more straightforward to approximate an in-sample \emph{softmax} using only actions in the dataset.
no code implementations • 23 Feb 2023 • Vincent Liu, Yash Chandak, Philip Thomas, Martha White
In this work, we consider the off-policy policy evaluation problem for contextual bandits and finite horizon reinforcement learning in the nonstationary setting.
no code implementations • 27 Jan 2023 • Lingwei Zhu, Zheng Chen, Takamitsu Matsubara, Martha White
Many policy optimization approaches in reinforcement learning incorporate a Kullback-Leilbler (KL) divergence to the previous policy, to prevent the policy from changing too quickly.
1 code implementation • 26 Jan 2023 • Brett Daley, Martha White, Christopher Amato, Marlos C. Machado
Off-policy learning from multistep returns is crucial for sample-efficient reinforcement learning, but counteracting off-policy bias without exacerbating variance is challenging.
no code implementations • 20 Jan 2023 • Khurram Javed, Haseeb Shah, Rich Sutton, Martha White
We show that by either decomposing the network into independent modules or learning a recurrent network incrementally, we can make RTRL scale linearly with the number of parameters.
no code implementations • 6 Jun 2022 • Chunlok Lo, Gabor Mihucz, Adam White, Farzane Aminmansour, Martha White
In this paper, we avoid this limitation by constraining background planning to a set of (abstract) subgoals and learning only local, subgoal-conditioned models.
no code implementations • 18 May 2022 • Han Wang, Archit Sakhadeo, Adam White, James Bell, Vincent Liu, Xutong Zhao, Puer Liu, Tadashi Kozuno, Alona Fyshe, Martha White
The performance of reinforcement learning (RL) agents is sensitive to the choice of hyperparameters.
no code implementations • 17 May 2022 • Andrew Patterson, Victor Liao, Martha White
We start from a formalization of robust losses, then derive sound gradient-based approaches to minimize these losses in both the online off-policy prediction and control settings.
no code implementations • 30 Mar 2022 • Han Wang, Erfan Miahi, Martha White, Marlos C. Machado, Zaheer Abbas, Raksha Kumaraswamy, Vincent Liu, Adam White
In this paper we investigate the properties of representations learned by deep reinforcement learning systems.
no code implementations • ICLR 2022 • Kirby Banman, Liam Peet-Pare, Nidhi Hegde, Alona Fyshe, Martha White
In this work, we show that SGDm under covariate shift with a fixed step-size can be unstable and diverge.
no code implementations • NeurIPS 2021 • Matthew McLeod, Chunlok Lo, Matthew Schlegel, Andrew Jacobsen, Raksha Kumaraswamy, Martha White, Adam White
Learning auxiliary tasks, such as multiple predictions about the world, can provide many benefits to reinforcement learning systems.
1 code implementation • 4 Feb 2022 • Samuele Tosatto, Andrew Patterson, Martha White, A. Rupam Mahmood
The policy gradient theorem (Sutton et al., 2000) prescribes the usage of a cumulative discounted state distribution under the target policy to approximate the gradient.
1 code implementation • 22 Dec 2021 • Shivam Garg, Samuele Tosatto, Yangchen Pan, Martha White, A. Rupam Mahmood
Policy gradient (PG) estimators are ineffective in dealing with softmax policies that are sub-optimally saturated, which refers to the situation when the policy concentrates its probability mass on sub-optimal actions.
1 code implementation • 15 Dec 2021 • Ehsan Imani, Wei Hu, Martha White
We then highlight why alignment between the top singular vectors and the targets can speed up learning and show in a classic synthetic transfer problem that representation alignment correlates with positive and negative transfer to similar and dissimilar tasks.
no code implementations • NeurIPS 2021 • Dhawal Gupta, Gabor Mihucz, Matthew Schlegel, James Kostas, Philip S. Thomas, Martha White
In this work, we revisit this approach and investigate if we can leverage other reinforcement learning approaches to improve learning.
1 code implementation • 16 Nov 2021 • Eric Graves, Ehsan Imani, Raksha Kumaraswamy, Martha White
A variety of theoretically-sound policy gradient algorithms exist for the on-policy setting due to the policy gradient theorem, which provides a simplified form for the gradient.
no code implementations • 15 Nov 2021 • Vincent Liu, James R. Wright, Martha White
Offline reinforcement learning -- learning a policy from a batch of data -- is known to be hard for general MDPs.
no code implementations • 29 Sep 2021 • Erfan Miahi, Revan MacQueen, Alex Ayoub, Abbas Masoumzadeh, Martha White
Soft-greedy operators, namely $\varepsilon$-greedy and softmax, remain a common choice to induce a basic level of exploration for action-value methods in reinforcement learning.
no code implementations • 29 Sep 2021 • Maryam Hashemzadeh, Wesley Chung, Martha White
To enable better performance, we investigate the offline-online setting: The agent has access to a batch of data to train on but is also allowed to learn during the evaluation phase in an online manner.
no code implementations • 17 Jul 2021 • Alan Chan, Hugo Silva, Sungsu Lim, Tadashi Kozuno, A. Rupam Mahmood, Martha White
Approximate Policy Iteration (API) algorithms alternate between (approximate) policy evaluation and (approximate) greedification.
no code implementations • 29 May 2021 • Qingfeng Lan, Luke Kumar, Martha White, Alona Fyshe
Correlates of secondary information appear in LSTM representations even though they are not part of an \emph{explicitly} supervised prediction task.
no code implementations • 28 Apr 2021 • Andrew Patterson, Adam White, Martha White
Many algorithms have been developed for off-policy value estimation based on the linear mean squared projected Bellman error (MSPBE) and are sound under linear function approximation.
1 code implementation • 9 Mar 2021 • Khurram Javed, Martha White, Rich Sutton
We empirically show that as long as connections between columns are sparse, our method approximates the true gradient well.
no code implementations • 1 Jan 2021 • Vincent Liu, Adam M White, Hengshuai Yao, Martha White
Catastrophic interference is common in many network-based learning systems, and many proposals exist for mitigating it.
no code implementations • 7 Dec 2020 • Sebastian Höfer, Kostas Bekris, Ankur Handa, Juan Camilo Gamboa, Florian Golemo, Melissa Mozifian, Chris Atkeson, Dieter Fox, Ken Goldberg, John Leonard, C. Karen Liu, Jan Peters, Shuran Song, Peter Welinder, Martha White
This report presents the debates, posters, and discussions of the Sim2Real workshop held in conjunction with the 2020 edition of the "Robotics: Science and System" conference.
1 code implementation • NeurIPS 2020 • Yash Chandak, Scott M. Jordan, Georgios Theocharous, Martha White, Philip S. Thomas
Many real-world sequential decision-making problems involve critical systems with financial risks and human-life risks.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Maryam Hashemzadeh, Greta Kaufeld, Martha White, Andrea E. Martin, Alona Fyshe
The representations generated by many models of language (word embeddings, recurrent neural networks and transformers) correlate to brain activity recorded while people read.
1 code implementation • 28 Sep 2020 • Jincheng Mei, Yangchen Pan, Martha White, Amir-Massoud Farahmand, Hengshuai Yao
The prioritized Experience Replay (ER) method has attracted great attention; however, there is little theoretical understanding of such prioritization strategy and why they help.
2 code implementations • 19 Jul 2020 • Yangchen Pan, Jincheng Mei, Amir-Massoud Farahmand, Martha White, Hengshuai Yao, Mohsen Rohani, Jun Luo
Prioritized Experience Replay (ER) has been empirically shown to improve sample efficiency across many domains and attracted great attention; however, there is little theoretical understanding of why such prioritized sampling helps and its limitations.
no code implementations • 7 Jul 2020 • Vincent Liu, Adam White, Hengshuai Yao, Martha White
In this work, we provide a definition of interference for control in reinforcement learning.
no code implementations • ICML 2020 • Zaheer Abbas, Samuel Sokota, Erin J. Talvitie, Martha White
We show that heteroscedastic regression can signal predictive uncertainty arising from model inadequacy that is complementary to that which is detected by methods designed for parameter uncertainty, indicating that considering both parameter uncertainty and model inadequacy may be a more promising direction for effective selective planning than either in isolation.
1 code implementation • ICML 2020 • Sina Ghiassian, Andrew Patterson, Shivam Garg, Dhawal Gupta, Adam White, Martha White
It is still common to use Q-learning and temporal difference (TD) learning-even though they have divergence issues and sound Gradient TD alternatives exist-because divergence seems rare and they typically perform well.
1 code implementation • 12 Jun 2020 • Khurram Javed, Martha White, Yoshua Bengio
One solution for achieving strong generalization is to incorporate causal structures in the models; such structures constrain learning by ignoring correlations that contradict them.
no code implementations • 8 Jun 2020 • Taher Jafferjee, Ehsan Imani, Erin Talvitie, Martha White, Micheal Bowling
Dyna-style reinforcement learning (RL) agents improve sample efficiency over model-free RL agents by updating the value function with simulated experience generated by an environment model.
1 code implementation • ICML 2020 • Yash Chandak, Georgios Theocharous, Shiv Shankar, Martha White, Sridhar Mahadevan, Philip S. Thomas
Most reinforcement learning methods are based upon the key assumption that the transition dynamics and reward functions are fixed, that is, the underlying Markov decision process is stationary.
no code implementations • 11 May 2020 • Yash Satsangi, Sungsu Lim, Shimon Whiteson, Frans Oliehoek, Martha White
Information gathering in a partially observable environment can be formulated as a reinforcement learning (RL), problem where the reward depends on the agent's uncertainty.
no code implementations • ICLR 2020 • Somjit Nath, Vincent Liu, Alan Chan, Xin Li, Adam White, Martha White
Recurrent neural networks (RNNs) allow an agent to construct a state-representation from a stream of experience, which is essential in partially observable problems.
1 code implementation • ICLR 2020 • Qingfeng Lan, Yangchen Pan, Alona Fyshe, Martha White
Q-learning suffers from overestimation bias, because it approximates the maximum action value using the maximum estimated action value.
no code implementations • NeurIPS 2020 • Yangchen Pan, Ehsan Imani, Martha White, Amir-Massoud Farahmand
We empirically demonstrate on several synthetic problems that our method (i) can learn multi-valued functions and produce the conditional modes, (ii) scales well to high-dimensional inputs, and (iii) can even be more effective for certain uni-modal problems, particularly for high-frequency functions.
1 code implementation • NeurIPS 2019 • Farzane Aminmansour, Andrew Patterson, Lei Le, Yisu Peng, Daniel Mitchell, Franco Pestilli, Cesar F. Caiafa, Russell Greiner, Martha White
We develop an efficient optimization strategy for this extremely high-dimensional sparse problem, by reducing the number of parameters using a greedy algorithm designed specifically for the problem.
1 code implementation • ICLR 2021 • Yangchen Pan, Kirby Banman, Martha White
Recent work has shown that sparse representations -- where only a small percentage of units are active -- can significantly reduce interference.
no code implementations • 3 Oct 2019 • Khurram Javed, Hengshuai Yao, Martha White
Gradient-based meta-learning has proven to be highly effective at learning model initializations, representations, and update rules that allow fast adaptation from a few samples.
no code implementations • 17 Jul 2019 • Andrew Jacobsen, Matthew Schlegel, Cameron Linke, Thomas Degris, Adam White, Martha White
This paper investigates different vector step-size adaptation approaches for non-stationary online, continual prediction problems.
no code implementations • 19 Jun 2019 • Cam Linke, Nadia M. Ady, Martha White, Thomas Degris, Adam White
The question we tackle in this paper is how to sculpt the stream of experience---how to adapt the learning system's behavior---to optimize the learning of a collection of value functions.
no code implementations • 18 Jun 2019 • Yangchen Pan, Hengshuai Yao, Amir-Massoud Farahmand, Martha White
In this work, we propose to generate such states by using the trajectory obtained from Hill Climbing (HC) the current estimate of the value function.
2 code implementations • NeurIPS 2019 • Matthew Schlegel, Wesley Chung, Daniel Graves, Jian Qian, Martha White
Importance sampling (IS) is a common reweighting strategy for off-policy prediction in reinforcement learning.
6 code implementations • NeurIPS 2019 • Khurram Javed, Martha White
We show that it is possible to learn naturally sparse representations that are more effective for online updating.
no code implementations • ICLR 2019 • Wesley Chung, Somjit Nath, Ajin Joseph, Martha White
A key component for many reinforcement learning agents is to learn a value function, either for policy evaluation or control.
no code implementations • 2 Apr 2019 • Yi Wan, Zaheer Abbas, Adam White, Martha White, Richard S. Sutton
In particular, we 1) show that planning with an expectation model is equivalent to planning with a distribution model if the state value function is linear in state features, 2) analyze two common parametrization choices for approximating the expectation: linear and non-linear expectation models, 3) propose a sound model-based policy evaluation algorithm and present its convergence results, and 4) empirically demonstrate the effectiveness of the proposed planning algorithm.
no code implementations • 3 Dec 2018 • Minghan Li, Tanli Zuo, Ruicheng Li, Martha White, Wei-Shi Zheng
Knowledge distillation is an effective technique that transfers knowledge from a large teacher model to a shallow student.
1 code implementation • NeurIPS 2018 • Lei Le, Andrew Patterson, Martha White
A common strategy to improve generalization has been through the use of regularizers, typically as a norm constraining the parameters.
no code implementations • NeurIPS 2018 • Ehsan Imani, Eric Graves, Martha White
There have been a host of theoretically sound algorithms proposed for the on-policy setting, due to the existence of the policy gradient theorem which provides a simplified form for the gradient.
no code implementations • 16 Nov 2018 • Tom Schaul, Hado van Hasselt, Joseph Modayil, Martha White, Adam White, Pierre-Luc Bacon, Jean Harb, Shibl Mourad, Marc Bellemare, Doina Precup
We want to make progress toward artificial general intelligence, namely general-purpose agents that autonomously learn how to competently act in complex environments.
no code implementations • 15 Nov 2018 • Vincent Liu, Raksha Kumaraswamy, Lei Le, Martha White
We investigate sparse representations for control in reinforcement learning.
no code implementations • NeurIPS 2018 • Raksha Kumaraswamy, Matthew Schlegel, Adam White, Martha White
Directed exploration strategies for reinforcement learning are critical for learning an optimal policy in a minimal number of interactions with the environment.
no code implementations • 6 Nov 2018 • Sina Ghiassian, Andrew Patterson, Martha White, Richard S. Sutton, Adam White
The ability to learn behavior-contingent predictions online and off-policy has long been advocated as a key capability of predictive-knowledge learning systems but remained an open algorithmic challenge for decades.
1 code implementation • 22 Oct 2018 • Samuel Neumann, Sungsu Lim, Ajin Joseph, Yangchen Pan, Adam White, Martha White
We first provide a policy improvement result in an idealized setting, and then prove that our conditional CEM (CCEM) strategy tracks a CEM update per state, even with changing action-values.
no code implementations • 27 Sep 2018 • Matthew Schlegel, Wesley Chung, Daniel Graves, Martha White
We propose Importance Resampling (IR) for off-policy learning, that resamples experience from the replay buffer and applies a standard on-policy update.
no code implementations • 28 Aug 2018 • Touqir Sajed, Wesley Chung, Martha White
We provide experiments investigating the number of samples required by this offline algorithm in simple benchmark reinforcement learning domains, and highlight that there are still many open questions to be solved for this important problem.
no code implementations • 18 Jul 2018 • Matthew Schlegel, Andrew Jacobsen, Zaheer Abbas, Andrew Patterson, Adam White, Martha White
A general purpose strategy for state construction is to learn the state update using a Recurrent Neural Network (RNN), which updates the internal state using the current internal state and the most recent observation.
no code implementations • ICML 2018 • Yangchen Pan, Amir-Massoud Farahmand, Martha White, Saleh Nabi, Piyush Grover, Daniel Nikovski
Recent work has shown that reinforcement learning (RL) is a promising approach to control dynamical systems described by partial differential equations (PDE).
no code implementations • 12 Jun 2018 • Yangchen Pan, Muhammad Zaheer, Adam White, Andrew Patterson, Martha White
We show that a model, as opposed to a replay buffer, is particularly useful for specifying which states to sample from during planning, such as predecessor states that propagate information in reverse from a state more quickly.
no code implementations • ICML 2018 • Ehsan Imani, Martha White
We provide theoretical support for this alternative hypothesis, by characterizing the norm of the gradients of this loss.
no code implementations • 25 Jan 2018 • Craig Sherstan, Brendan Bennett, Kenny Young, Dylan R. Ashley, Adam White, Martha White, Richard S. Sutton
This paper investigates estimating the variance of a temporal-difference learning agent's update target.
no code implementations • ICLR 2018 • Matthew Schlegel, Andrew Patterson, Adam White, Martha White
We investigate a framework for discovery: curating a large collection of predictions, which are used to construct the agent's representation of the world.
no code implementations • NeurIPS 2017 • Mahdi Karami, Martha White, Dale Schuurmans, Csaba Szepesvari
In this paper, we instead reconsider likelihood maximization and develop an optimization based strategy for recovering the latent states and transition parameters.
no code implementations • 3 Aug 2017 • Yangchen Pan, Erfan Sadeqi Azer, Martha White
As a remedy, we demonstrate how to use sketching more sparingly, with only a left-sided sketch, that can still enable significant computational gains and the use of these matrix-based learning algorithms that are less sensitive to parameters.
no code implementations • ICML 2017 • Matthew Schlegel, Yangchen Pan, Jiecao Chen, Martha White
In this work, we develop an approximately submodular criterion for this setting, and an efficient online greedy submodular maximization algorithm for optimizing the criterion.
no code implementations • 26 Jul 2017 • Lei Le, Raksha Kumaraswamy, Martha White
Outside of reinforcement learning, sparse coding representations have been widely used, with non-convex objectives that result in discriminative representations.
no code implementations • 2 Feb 2017 • Shantanu Jain, Martha White, Predrag Radivojac
A common approach in positive-unlabeled learning is to train a classification model between labeled and unlabeled data.
no code implementations • 28 Nov 2016 • Yangchen Pan, Adam White, Martha White
The family of temporal difference (TD) methods span a spectrum from computationally frugal linear methods like TD({\lambda}) to data efficient least squares methods.
no code implementations • ICML 2017 • Martha White
Reinforcement learning tasks are typically specified as Markov decision processes.
2 code implementations • 2 Jul 2016 • Martha White, Adam White
One of the main obstacles to broad application of reinforcement learning methods is the parameter sensitivity of our core learning algorithms.
no code implementations • NeurIPS 2016 • Shantanu Jain, Martha White, Predrag Radivojac
We develop a classification algorithm for estimating posterior distributions from positive-unlabeled data, that is robust to noise in the positive labels and effective for high-dimensional data.
no code implementations • 17 Apr 2016 • Lei Le, Martha White
We then provide an empirical investigation into practical optimization choices for using alternating minimization for induced DLMs, for both batch and stochastic gradient descent.
1 code implementation • 28 Feb 2016 • Adam White, Martha White
First, we derive two new hybrid TD policy-evaluation algorithms, which fill a gap in this collection of algorithms.
1 code implementation • 8 Jan 2016 • Shantanu Jain, Martha White, Michael W. Trosset, Predrag Radivojac
This problem can be decomposed into two steps: (i) the development of accurate predictors that discriminate between positive and unlabeled data, and (ii) the accurate estimation of the prior probabilities of positive and negative examples.
no code implementations • 26 Nov 2015 • Clement Gehring, Yangchen Pan, Martha White
Balancing between computational efficiency and sample efficiency is an important goal in reinforcement learning.
no code implementations • 6 Jul 2015 • A. Rupam Mahmood, Huizhen Yu, Martha White, Richard S. Sutton
Emphatic algorithms are temporal-difference learning algorithms that change their effective state distribution by selectively emphasizing and de-emphasizing their updates on different time steps.
no code implementations • 14 Mar 2015 • Richard S. Sutton, A. Rupam Mahmood, Martha White
In this paper we introduce the idea of improving the performance of parametric temporal-difference (TD) learning algorithms by selectively emphasizing or de-emphasizing their updates on different time steps.
no code implementations • NeurIPS 2012 • Martha White, Xinhua Zhang, Dale Schuurmans, Yao-Liang Yu
Subspace learning seeks a low dimensional representation of data that enables accurate reconstruction.
no code implementations • 22 May 2012 • Thomas Degris, Martha White, Richard S. Sutton
Previous work on actor-critic algorithms is limited to the on-policy setting and does not take advantage of the recent advances in off-policy gradient temporal-difference learning.
no code implementations • NeurIPS 2010 • Min Yang, Linli Xu, Martha White, Dale Schuurmans, Yao-Liang Yu
We present a generic procedure that can be applied to standard loss functions and demonstrate improved robustness in regression and classification problems.
no code implementations • NeurIPS 2010 • Martha White, Adam White
The reinforcement learning community has explored many approaches to obtain- ing value estimates and models to guide decision making; these approaches, how- ever, do not usually provide a measure of confidence in the estimate.