Search Results for author: A. Rupam Mahmood

Found 28 papers, 19 papers with code

Addressing Loss of Plasticity and Catastrophic Forgetting in Continual Learning

1 code implementation • 31 Mar 2024 • Mohamed Elsayed, A. Rupam Mahmood

Deep representation learning methods struggle with continual learning, suffering from both catastrophic forgetting of useful units and loss of plasticity, often due to rigid and unuseful units.

Continual Learning Representation Learning

Paper
Code

MaDi: Learning to Mask Distractions for Generalization in Visual Deep Reinforcement Learning

1 code implementation • 23 Dec 2023 • Bram Grooten, Tristan Tomilin, Gautham Vasan, Matthew E. Taylor, A. Rupam Mahmood, Meng Fang, Mykola Pechenizkiy, Decebal Constantin Mocanu

Our algorithm improves the agent's focus with useful masks, while its efficient Masker network only adds 0. 2% more parameters to the original structure, in contrast to previous work.

Data Augmentation

Paper
Code

Elephant Neural Networks: Born to Be a Continual Learner

no code implementations • 2 Oct 2023 • Qingfeng Lan, A. Rupam Mahmood

We show that by simply replacing classical activation functions with elephant activation functions, we can significantly improve the resilience of neural networks to catastrophic forgetting.

Class Incremental Learning Incremental Learning +1

Paper
Add Code

Maintaining Plasticity in Deep Continual Learning

1 code implementation • 23 Jun 2023 • Shibhansh Dohare, J. Fernando Hernandez-Garcia, Parash Rahman, A. Rupam Mahmood, Richard S. Sutton

If deep-learning systems are applied in a continual learning setting, then it is well known that they may fail to remember earlier examples.

Binary Classification Continual Learning +1

Paper
Code

Correcting discount-factor mismatch in on-policy policy gradient methods

no code implementations • 23 Jun 2023 • Fengdi Che, Gautham Vasan, A. Rupam Mahmood

The policy gradient theorem gives a convenient form of the policy gradient in terms of three factors: an action value, a gradient of the action likelihood, and a state distribution involving discounting called the \emph{discounted stationary distribution}.

OpenAI Gym Policy Gradient Methods

Paper
Add Code

Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo

1 code implementation • 29 May 2023 • Haque Ishfaq, Qingfeng Lan, Pan Xu, A. Rupam Mahmood, Doina Precup, Anima Anandkumar, Kamyar Azizzadenesheli

One of the key shortcomings of existing Thompson sampling algorithms is the need to perform a Gaussian approximation of the posterior distribution, which is not a good surrogate in most practical settings.

Efficient Exploration reinforcement-learning +2

Paper
Code

Reducing the Cost of Cycle-Time Tuning for Real-World Policy Optimization

1 code implementation • 9 May 2023 • Homayoon Farrahi, A. Rupam Mahmood

In this work, we investigate the widely-used baseline hyper-parameter values of two policy gradient algorithms -- PPO and SAC -- across different cycle times.

Paper
Code

Utility-based Perturbed Gradient Descent: An Optimizer for Continual Learning

no code implementations • 7 Feb 2023 • Mohamed Elsayed, A. Rupam Mahmood

Modern representation learning methods often struggle to adapt quickly under non-stationarity because they suffer from catastrophic forgetting and decaying plasticity.

Continual Learning Representation Learning

Paper
Add Code

Learning to Optimize for Reinforcement Learning

1 code implementation • 3 Feb 2023 • Qingfeng Lan, A. Rupam Mahmood, Shuicheng Yan, Zhongwen Xu

Reinforcement learning (RL) is essentially different from supervised learning and in practice these learned optimizers do not work well even in simple RL tasks.

Inductive Bias Meta-Learning +2

Paper
Code

Dynamic Decision Frequency with Continuous Options

1 code implementation • 6 Dec 2022 • Amirmohammad Karimi, Jun Jin, Jun Luo, A. Rupam Mahmood, Martin Jagersand, Samuele Tosatto

In classic reinforcement learning algorithms, agents make decisions at discrete and fixed time intervals.

Continuous Control

Paper
Code

HesScale: Scalable Computation of Hessian Diagonals

1 code implementation • 20 Oct 2022 • Mohamed Elsayed, A. Rupam Mahmood

Second-order optimization uses curvature information about the objective function, which can help in faster convergence.

Paper
Code

Real-Time Reinforcement Learning for Vision-Based Robotics Utilizing Local and Remote Computers

2 code implementations • 5 Oct 2022 • Yan Wang, Gautham Vasan, A. Rupam Mahmood

A common setup for a robotic agent is to have two different computers simultaneously: a resource-limited local computer tethered to the robot and a powerful remote computer connected wirelessly.

Reinforcement Learning (RL)

Paper
Code

Memory-efficient Reinforcement Learning with Value-based Knowledge Consolidation

1 code implementation • 22 May 2022 • Qingfeng Lan, Yangchen Pan, Jun Luo, A. Rupam Mahmood

The experience replay buffer, a standard component in deep reinforcement learning, is often used to reduce forgetting and improve sample efficiency by storing experiences in a large buffer and using them for training later.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Asynchronous Reinforcement Learning for Real-Time Control of Physical Robots

1 code implementation • 23 Mar 2022 • Yufeng Yuan, A. Rupam Mahmood

An oft-ignored challenge of real-world reinforcement learning is that the real world does not pause when agents make learning updates.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

A Temporal-Difference Approach to Policy Gradient Estimation

1 code implementation • 4 Feb 2022 • Samuele Tosatto, Andrew Patterson, Martha White, A. Rupam Mahmood

The policy gradient theorem (Sutton et al., 2000) prescribes the usage of a cumulative discounted state distribution under the target policy to approximate the gradient.

Paper
Code

An Alternate Policy Gradient Estimator for Softmax Policies

1 code implementation • 22 Dec 2021 • Shivam Garg, Samuele Tosatto, Yangchen Pan, Martha White, A. Rupam Mahmood

Policy gradient (PG) estimators are ineffective in dealing with softmax policies that are sub-optimally saturated, which refers to the situation when the policy concentrates its probability mass on sub-optimal actions.

Paper
Code

Continual Backprop: Stochastic Gradient Descent with Persistent Randomness

1 code implementation • 13 Aug 2021 • Shibhansh Dohare, Richard S. Sutton, A. Rupam Mahmood

The Backprop algorithm for learning in neural networks utilizes two mechanisms: first, stochastic gradient descent and second, initialization with small random weights, where the latter is essential to the effectiveness of the former.

Continual Learning Reinforcement Learning (RL)

Paper
Code

Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences

no code implementations • 17 Jul 2021 • Alan Chan, Hugo Silva, Sungsu Lim, Tadashi Kozuno, A. Rupam Mahmood, Martha White

Approximate Policy Iteration (API) algorithms alternate between (approximate) policy evaluation and (approximate) greedification.

Policy Gradient Methods

Paper
Add Code

Model-free Policy Learning with Reward Gradients

1 code implementation • 9 Mar 2021 • Qingfeng Lan, Samuele Tosatto, Homayoon Farrahi, A. Rupam Mahmood

As a key component in reinforcement learning, the reward function is usually devised carefully to guide the agent.

Continuous Control Policy Gradient Methods

Paper
Code

Autoregressive Policies for Continuous Control Deep Reinforcement Learning

1 code implementation • 27 Mar 2019 • Dmytro Korenkevych, A. Rupam Mahmood, Gautham Vasan, James Bergstra

We introduce a family of stationary autoregressive (AR) stochastic processes to facilitate exploration in continuous control domains.

Continuous Control reinforcement-learning +1

Paper
Code

Benchmarking Reinforcement Learning Algorithms on Real-World Robots

2 code implementations • 20 Sep 2018 • A. Rupam Mahmood, Dmytro Korenkevych, Gautham Vasan, William Ma, James Bergstra

The research community is now able to reproduce, analyze and build quickly on these results due to open source implementations of learning algorithms and simulated benchmark tasks.

Benchmarking Continuous Control +2

208

Paper
Code

Setting up a Reinforcement Learning Task with a Real-World Robot

2 code implementations • 19 Mar 2018 • A. Rupam Mahmood, Dmytro Korenkevych, Brent J. Komer, James Bergstra

Reinforcement learning is a promising approach to developing hard-to-engineer adaptive solutions for complex and diverse robotic tasks.

reinforcement-learning Reinforcement Learning (RL)

208

Paper
Code

On Generalized Bellman Equations and Temporal-Difference Learning

no code implementations • 14 Apr 2017 • Huizhen Yu, A. Rupam Mahmood, Richard S. Sutton

As to its soundness, using Markov chain theory, we prove the ergodicity of the joint state-trace process under nonrestrictive conditions, and we show that associated with our scheme is a generalized Bellman equation (for the policy to be evaluated) that depends on both the evolution of $\lambda$ and the unique invariant probability measure of the state-trace process.

Paper
Add Code

True Online Temporal-Difference Learning

1 code implementation • 13 Dec 2015 • Harm van Seijen, A. Rupam Mahmood, Patrick M. Pilarski, Marlos C. Machado, Richard S. Sutton

Our results suggest that the true online methods indeed dominate the regular methods.

Atari Games

Paper
Code

Emphatic Temporal-Difference Learning

no code implementations • 6 Jul 2015 • A. Rupam Mahmood, Huizhen Yu, Martha White, Richard S. Sutton

Emphatic algorithms are temporal-difference learning algorithms that change their effective state distribution by selectively emphasizing and de-emphasizing their updates on different time steps.

Paper
Add Code

An Empirical Evaluation of True Online TD(λ)

no code implementations • 1 Jul 2015 • Harm van Seijen, A. Rupam Mahmood, Patrick M. Pilarski, Richard S. Sutton

Our results confirm the strength of true online TD({\lambda}): 1) for sparse feature vectors, the computational overhead with respect to TD({\lambda}) is minimal; for non-sparse features the computation time is at most twice that of TD({\lambda}), 2) across all domains/representations the learning speed of true online TD({\lambda}) is often better, but never worse than that of TD({\lambda}), and 3) true online TD({\lambda}) is easier to use, because it does not require choosing between trace types, and it is generally more stable with respect to the step-size.

Paper
Add Code

An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning

no code implementations • 14 Mar 2015 • Richard S. Sutton, A. Rupam Mahmood, Martha White

In this paper we introduce the idea of improving the performance of parametric temporal-difference (TD) learning algorithms by selectively emphasizing or de-emphasizing their updates on different time steps.

Paper
Add Code

Weighted importance sampling for off-policy learning with linear function approximation

no code implementations • NeurIPS 2014 • A. Rupam Mahmood, Hado P. Van Hasselt, Richard S. Sutton

Second, we show that these benefits extend to a new weighted-importance-sampling version of off-policy LSTD(lambda).

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.