Search Results for author: A. Rupam Mahmood

Found 27 papers, 18 papers with code

MaDi: Learning to Mask Distractions for Generalization in Visual Deep Reinforcement Learning

1 code implementation23 Dec 2023 Bram Grooten, Tristan Tomilin, Gautham Vasan, Matthew E. Taylor, A. Rupam Mahmood, Meng Fang, Mykola Pechenizkiy, Decebal Constantin Mocanu

Our algorithm improves the agent's focus with useful masks, while its efficient Masker network only adds 0. 2% more parameters to the original structure, in contrast to previous work.

Data Augmentation

Elephant Neural Networks: Born to Be a Continual Learner

no code implementations2 Oct 2023 Qingfeng Lan, A. Rupam Mahmood

We show that by simply replacing classical activation functions with elephant activation functions, we can significantly improve the resilience of neural networks to catastrophic forgetting.

Class Incremental Learning Incremental Learning +1

Correcting discount-factor mismatch in on-policy policy gradient methods

no code implementations23 Jun 2023 Fengdi Che, Gautham Vasan, A. Rupam Mahmood

The policy gradient theorem gives a convenient form of the policy gradient in terms of three factors: an action value, a gradient of the action likelihood, and a state distribution involving discounting called the \emph{discounted stationary distribution}.

OpenAI Gym Policy Gradient Methods

Loss of Plasticity in Deep Continual Learning

1 code implementation23 Jun 2023 Shibhansh Dohare, J. Fernando Hernandez-Garcia, Parash Rahman, Richard S. Sutton, A. Rupam Mahmood

If deep-learning systems are applied in a continual learning setting, then it is well known that they may fail to remember earlier examples.

Binary Classification Continual Learning

Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo

1 code implementation29 May 2023 Haque Ishfaq, Qingfeng Lan, Pan Xu, A. Rupam Mahmood, Doina Precup, Anima Anandkumar, Kamyar Azizzadenesheli

One of the key shortcomings of existing Thompson sampling algorithms is the need to perform a Gaussian approximation of the posterior distribution, which is not a good surrogate in most practical settings.

Efficient Exploration reinforcement-learning +2

Reducing the Cost of Cycle-Time Tuning for Real-World Policy Optimization

1 code implementation9 May 2023 Homayoon Farrahi, A. Rupam Mahmood

In this work, we investigate the widely-used baseline hyper-parameter values of two policy gradient algorithms -- PPO and SAC -- across different cycle times.

Utility-based Perturbed Gradient Descent: An Optimizer for Continual Learning

no code implementations7 Feb 2023 Mohamed Elsayed, A. Rupam Mahmood

Modern representation learning methods often struggle to adapt quickly under non-stationarity because they suffer from catastrophic forgetting and decaying plasticity.

Continual Learning Representation Learning

Learning to Optimize for Reinforcement Learning

1 code implementation3 Feb 2023 Qingfeng Lan, A. Rupam Mahmood, Shuicheng Yan, Zhongwen Xu

Reinforcement learning (RL) is essentially different from supervised learning and in practice these learned optimizers do not work well even in simple RL tasks.

Inductive Bias Meta-Learning +2

Dynamic Decision Frequency with Continuous Options

1 code implementation6 Dec 2022 Amirmohammad Karimi, Jun Jin, Jun Luo, A. Rupam Mahmood, Martin Jagersand, Samuele Tosatto

In classic reinforcement learning algorithms, agents make decisions at discrete and fixed time intervals.

Continuous Control

HesScale: Scalable Computation of Hessian Diagonals

1 code implementation20 Oct 2022 Mohamed Elsayed, A. Rupam Mahmood

Second-order optimization uses curvature information about the objective function, which can help in faster convergence.

Real-Time Reinforcement Learning for Vision-Based Robotics Utilizing Local and Remote Computers

2 code implementations5 Oct 2022 Yan Wang, Gautham Vasan, A. Rupam Mahmood

A common setup for a robotic agent is to have two different computers simultaneously: a resource-limited local computer tethered to the robot and a powerful remote computer connected wirelessly.

Reinforcement Learning (RL)

Memory-efficient Reinforcement Learning with Value-based Knowledge Consolidation

1 code implementation22 May 2022 Qingfeng Lan, Yangchen Pan, Jun Luo, A. Rupam Mahmood

The experience replay buffer, a standard component in deep reinforcement learning, is often used to reduce forgetting and improve sample efficiency by storing experiences in a large buffer and using them for training later.

reinforcement-learning Reinforcement Learning (RL)

Asynchronous Reinforcement Learning for Real-Time Control of Physical Robots

1 code implementation23 Mar 2022 Yufeng Yuan, A. Rupam Mahmood

An oft-ignored challenge of real-world reinforcement learning is that the real world does not pause when agents make learning updates.

reinforcement-learning Reinforcement Learning (RL)

A Temporal-Difference Approach to Policy Gradient Estimation

1 code implementation4 Feb 2022 Samuele Tosatto, Andrew Patterson, Martha White, A. Rupam Mahmood

The policy gradient theorem (Sutton et al., 2000) prescribes the usage of a cumulative discounted state distribution under the target policy to approximate the gradient.

An Alternate Policy Gradient Estimator for Softmax Policies

1 code implementation22 Dec 2021 Shivam Garg, Samuele Tosatto, Yangchen Pan, Martha White, A. Rupam Mahmood

Policy gradient (PG) estimators are ineffective in dealing with softmax policies that are sub-optimally saturated, which refers to the situation when the policy concentrates its probability mass on sub-optimal actions.

Continual Backprop: Stochastic Gradient Descent with Persistent Randomness

1 code implementation13 Aug 2021 Shibhansh Dohare, Richard S. Sutton, A. Rupam Mahmood

The Backprop algorithm for learning in neural networks utilizes two mechanisms: first, stochastic gradient descent and second, initialization with small random weights, where the latter is essential to the effectiveness of the former.

Continual Learning Reinforcement Learning (RL)

Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences

no code implementations17 Jul 2021 Alan Chan, Hugo Silva, Sungsu Lim, Tadashi Kozuno, A. Rupam Mahmood, Martha White

Approximate Policy Iteration (API) algorithms alternate between (approximate) policy evaluation and (approximate) greedification.

Policy Gradient Methods

Model-free Policy Learning with Reward Gradients

1 code implementation9 Mar 2021 Qingfeng Lan, Samuele Tosatto, Homayoon Farrahi, A. Rupam Mahmood

As a key component in reinforcement learning, the reward function is usually devised carefully to guide the agent.

Continuous Control Policy Gradient Methods

Autoregressive Policies for Continuous Control Deep Reinforcement Learning

1 code implementation27 Mar 2019 Dmytro Korenkevych, A. Rupam Mahmood, Gautham Vasan, James Bergstra

We introduce a family of stationary autoregressive (AR) stochastic processes to facilitate exploration in continuous control domains.

Continuous Control reinforcement-learning +1

Benchmarking Reinforcement Learning Algorithms on Real-World Robots

2 code implementations20 Sep 2018 A. Rupam Mahmood, Dmytro Korenkevych, Gautham Vasan, William Ma, James Bergstra

The research community is now able to reproduce, analyze and build quickly on these results due to open source implementations of learning algorithms and simulated benchmark tasks.

Benchmarking Continuous Control +2

Setting up a Reinforcement Learning Task with a Real-World Robot

2 code implementations19 Mar 2018 A. Rupam Mahmood, Dmytro Korenkevych, Brent J. Komer, James Bergstra

Reinforcement learning is a promising approach to developing hard-to-engineer adaptive solutions for complex and diverse robotic tasks.

reinforcement-learning Reinforcement Learning (RL)

On Generalized Bellman Equations and Temporal-Difference Learning

no code implementations14 Apr 2017 Huizhen Yu, A. Rupam Mahmood, Richard S. Sutton

As to its soundness, using Markov chain theory, we prove the ergodicity of the joint state-trace process under nonrestrictive conditions, and we show that associated with our scheme is a generalized Bellman equation (for the policy to be evaluated) that depends on both the evolution of $\lambda$ and the unique invariant probability measure of the state-trace process.

True Online Temporal-Difference Learning

1 code implementation13 Dec 2015 Harm van Seijen, A. Rupam Mahmood, Patrick M. Pilarski, Marlos C. Machado, Richard S. Sutton

Our results suggest that the true online methods indeed dominate the regular methods.

Atari Games

Emphatic Temporal-Difference Learning

no code implementations6 Jul 2015 A. Rupam Mahmood, Huizhen Yu, Martha White, Richard S. Sutton

Emphatic algorithms are temporal-difference learning algorithms that change their effective state distribution by selectively emphasizing and de-emphasizing their updates on different time steps.

An Empirical Evaluation of True Online TD(λ)

no code implementations1 Jul 2015 Harm van Seijen, A. Rupam Mahmood, Patrick M. Pilarski, Richard S. Sutton

Our results confirm the strength of true online TD({\lambda}): 1) for sparse feature vectors, the computational overhead with respect to TD({\lambda}) is minimal; for non-sparse features the computation time is at most twice that of TD({\lambda}), 2) across all domains/representations the learning speed of true online TD({\lambda}) is often better, but never worse than that of TD({\lambda}), and 3) true online TD({\lambda}) is easier to use, because it does not require choosing between trace types, and it is generally more stable with respect to the step-size.

Test

An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning

no code implementations14 Mar 2015 Richard S. Sutton, A. Rupam Mahmood, Martha White

In this paper we introduce the idea of improving the performance of parametric temporal-difference (TD) learning algorithms by selectively emphasizing or de-emphasizing their updates on different time steps.

Weighted importance sampling for off-policy learning with linear function approximation

no code implementations NeurIPS 2014 A. Rupam Mahmood, Hado P. Van Hasselt, Richard S. Sutton

Second, we show that these benefits extend to a new weighted-importance-sampling version of off-policy LSTD(lambda).

Cannot find the paper you are looking for? You can Submit a new open access paper.