Search Results for author: Parameswaran Kamalaruban

Found 23 papers, 7 papers with code

Reward Model Learning vs. Direct Policy Optimization: A Comparative Analysis of Learning from Human Preferences

no code implementations4 Mar 2024 Andi Nika, Debmalya Mandal, Parameswaran Kamalaruban, Georgios Tzannetos, Goran Radanović, Adish Singla

Moreover, we extend our analysis to the approximate optimization setting and derive exponentially decaying convergence rates for both RLHF and DPO.

Informativeness of Reward Functions in Reinforcement Learning

1 code implementation10 Feb 2024 Rati Devidze, Parameswaran Kamalaruban, Adish Singla

Reward functions are central in specifying the task we want a reinforcement learning agent to perform.

Informativeness reinforcement-learning

Proximal Curriculum for Reinforcement Learning Agents

1 code implementation25 Apr 2023 Georgios Tzannetos, Bárbara Gomes Ribeiro, Parameswaran Kamalaruban, Adish Singla

We consider the problem of curriculum design for reinforcement learning (RL) agents in contextual multi-task settings.

reinforcement-learning Reinforcement Learning (RL)

Learning Personalized Decision Support Policies

no code implementations13 Apr 2023 Umang Bhatt, Valerie Chen, Katherine M. Collins, Parameswaran Kamalaruban, Emma Kallina, Adrian Weller, Ameet Talwalkar

In this work, we propose learning a decision support policy that, for a given input, chooses which form of support, if any, to provide.

Multi-Armed Bandits

Robust Learning from Observation with Model Misspecification

1 code implementation12 Feb 2022 Luca Viano, Yu-Ting Huang, Parameswaran Kamalaruban, Craig Innes, Subramanian Ramamoorthy, Adrian Weller

Imitation learning (IL) is a popular paradigm for training policies in robotic systems when specifying the reward function is difficult.

Continuous Control Imitation Learning +1

Explicable Reward Design for Reinforcement Learning Agents

1 code implementation NeurIPS 2021 Rati Devidze, Goran Radanovic, Parameswaran Kamalaruban, Adish Singla

By being explicable, we seek to capture two properties: (a) informativeness so that the rewards speed up the agent's convergence, and (b) sparseness as a proxy for ease of interpretability of the rewards.

Informativeness reinforcement-learning +1

Interaction-limited Inverse Reinforcement Learning

no code implementations1 Jul 2020 Martin Troussard, Emmanuel Pignat, Parameswaran Kamalaruban, Sylvain Calinon, Volkan Cevher

This paper proposes an inverse reinforcement learning (IRL) framework to accelerate learning when the learner-teacher \textit{interaction} is \textit{limited} during training.

reinforcement-learning Reinforcement Learning (RL)

Environment Shaping in Reinforcement Learning using State Abstraction

no code implementations23 Jun 2020 Parameswaran Kamalaruban, Rati Devidze, Volkan Cevher, Adish Singla

However, the applicability of potential-based reward shaping is limited in settings where (i) the state space is very large, and it is challenging to compute an appropriate potential function, (ii) the feedback signals are noisy, and even with shaped rewards the agent could be trapped in local optima, and (iii) changing the rewards alone is not sufficient, and effective shaping requires changing the dynamics.

reinforcement-learning Reinforcement Learning (RL)

Optimization for Reinforcement Learning: From Single Agent to Cooperative Agents

no code implementations1 Dec 2019 Donghwan Lee, Niao He, Parameswaran Kamalaruban, Volkan Cevher

This article reviews recent advances in multi-agent reinforcement learning algorithms for large-scale control systems and communication networks, which learn to communicate and cooperate.

Distributed Optimization Multi-agent Reinforcement Learning +2

Interactive Teaching Algorithms for Inverse Reinforcement Learning

no code implementations28 May 2019 Parameswaran Kamalaruban, Rati Devidze, Volkan Cevher, Adish Singla

We study the problem of inverse reinforcement learning (IRL) with the added twist that the learner is assisted by a helpful teacher.

reinforcement-learning Reinforcement Learning (RL)

Iterative Classroom Teaching

no code implementations8 Nov 2018 Teresa Yeo, Parameswaran Kamalaruban, Adish Singla, Arpit Merchant, Thibault Asselborn, Louis Faucon, Pierre Dillenbourg, Volkan Cevher

We consider the machine teaching problem in a classroom-like setting wherein the teacher has to deliver the same examples to a diverse group of students.

Transitions, Losses, and Re-parameterizations: Elements of Prediction Games

no code implementations20 May 2018 Parameswaran Kamalaruban

This thesis presents some geometric insights into three different types of two player prediction games -- namely general learning task, prediction with expert advice, and online convex optimization.

Exp-Concavity of Proper Composite Losses

no code implementations20 May 2018 Parameswaran Kamalaruban, Robert C. Williamson, Xinhua Zhang

In special cases like the Aggregating Algorithm (\cite{vovk1995game}) with mixable losses and the Weighted Average Algorithm (\cite{kivinen1999averaging}) with exp-concave losses, it is possible to achieve $O(1)$ regret bounds.

Computational Efficiency

Minimax Lower Bounds for Cost Sensitive Classification

no code implementations20 May 2018 Parameswaran Kamalaruban, Robert C. Williamson

The cost-sensitive classification problem plays a crucial role in mission-critical machine learning applications, and differs with traditional classification by taking the misclassification costs into consideration.

BIG-bench Machine Learning Binary Classification +2

Consistent Robust Regression

no code implementations NeurIPS 2017 Kush Bhatia, Prateek Jain, Parameswaran Kamalaruban, Purushottam Kar

We present the first efficient and provably consistent estimator for the robust regression problem.

regression

Improved Optimistic Mirror Descent for Sparsity and Curvature

no code implementations8 Sep 2016 Parameswaran Kamalaruban

Online Convex Optimization plays a key role in large scale machine learning.

Efficient and Consistent Robust Time Series Analysis

no code implementations1 Jul 2016 Kush Bhatia, Prateek Jain, Parameswaran Kamalaruban, Purushottam Kar

We illustrate our methods on synthetic datasets and show that our methods indeed are able to consistently recover the optimal parameters despite a large fraction of points being corrupted.

regression Time Series +1

Cannot find the paper you are looking for? You can Submit a new open access paper.