Search Results for author: Pradeep Varakantham

Found 47 papers, 15 papers with code

Optimizing Ride-Pooling Operations with Extended Pickup and Drop-Off Flexibility

no code implementations11 Mar 2025 Hao Jiang, Yixing Xu, Pradeep Varakantham

The Ride-Pool Matching Problem (RMP) is central to on-demand ride-pooling services, where vehicles must be matched with multiple requests while adhering to service constraints such as pickup delays, detour limits, and vehicle capacity.

On Generalization Across Environments In Multi-Objective Reinforcement Learning

1 code implementation2 Mar 2025 Jayden Teoh, Pradeep Varakantham, Peter Vamplew

Despite recent advances, existing MORL literature has narrowly focused on performance within static environments, neglecting the importance of generalizing across diverse settings.

Decision Making Multi-Objective Reinforcement Learning +3

Improving Environment Novelty Quantification for Effective Unsupervised Environment Design

no code implementations8 Feb 2025 Jayden Teoh, Wenjun Li, Pradeep Varakantham

Unsupervised Environment Design (UED) formalizes the problem of autocurricula through interactive training between a teacher agent and a student agent.

On Learning Informative Trajectory Embeddings for Imitation, Classification and Regression

1 code implementation16 Jan 2025 Zichang Ge, Changyu Chen, Arunesh Sinha, Pradeep Varakantham

In real-world sequential decision making tasks like autonomous driving, robotics, and healthcare, learning from observed state-action trajectories is critical for tasks like imitation, classification, and clustering.

Autonomous Driving Clustering +4

Offline Safe Reinforcement Learning Using Trajectory Classification

1 code implementation19 Dec 2024 Ze Gong, Akshat Kumar, Pradeep Varakantham

In this paper, we propose to learn a policy that generates desirable trajectories and avoids undesirable trajectories.

Classification reinforcement-learning +3

IRL for Restless Multi-Armed Bandits with Applications in Maternal and Child Health

1 code implementation11 Dec 2024 Gauri Jain, Pradeep Varakantham, Haifeng Xu, Aparna Taneja, Prashant Doshi, Milind Tambe

To address this shortcoming, this paper is the first to present the use of inverse reinforcement learning (IRL) to learn desired rewards for RMABs, and we demonstrate improved outcomes in a maternal and child health telehealth program.

Multi-Armed Bandits

Semantic Loss Guided Data Efficient Supervised Fine Tuning for Safe Responses in LLMs

no code implementations7 Dec 2024 Yuxiao Lu, Arunesh Sinha, Pradeep Varakantham

In this paper, we aim to take this problem and overcome limitations of requiring significant high-quality human data.

Preserving the Privacy of Reward Functions in MDPs through Deception

1 code implementation13 Jul 2024 Shashank Reddy Chirra, Pradeep Varakantham, Praveen Paruchuri

Experiments on multiple benchmark problems show that our approach outperforms previous methods in preserving reward function privacy.

Decision Making Sequential Decision Making

Safety through feedback in Constrained RL

1 code implementation28 Jun 2024 Shashank Reddy Chirra, Pradeep Varakantham, Praveen Paruchuri

To address these questions, we introduce \textit{novelty-based sampling} that selectively involves the evaluator only when the the agent encounters a \textit{novel} trajectory.

EduQate: Generating Adaptive Curricula through RMABs in Education Settings

no code implementations20 Jun 2024 Sidney Tio, Dexun Li, Pradeep Varakantham

There has been significant interest in the development of personalized and adaptive educational tools that cater to a student's individual learning progress.

Multi-Armed Bandits Q-Learning

Unlocking Large Language Model's Planning Capabilities with Maximum Diversity Fine-tuning

no code implementations15 Jun 2024 Wenjun Li, Changyu Chen, Pradeep Varakantham

To address this challenge, we propose the Maximum Diversity Fine-Tuning (MDFT) strategy to improve the sample efficiency of fine-tuning in the planning domain.

Diversity valid

Bootstrapping Language Models with DPO Implicit Rewards

1 code implementation14 Jun 2024 Changyu Chen, Zichen Liu, Chao Du, Tianyu Pang, Qian Liu, Arunesh Sinha, Pradeep Varakantham, Min Lin

In this work, we make a novel observation that this implicit reward model can by itself be used in a bootstrapping fashion to further align the LLM.

Probabilistic Perspectives on Error Minimization in Adversarial Reinforcement Learning

1 code implementation7 Jun 2024 Roman Belaire, Arunesh Sinha, Pradeep Varakantham

Deep Reinforcement Learning (DRL) policies are highly susceptible to adversarial noise in observations, which poses significant risks in safety-critical scenarios.

counterfactual Deep Reinforcement Learning +1

Imitating Cost-Constrained Behaviors in Reinforcement Learning

1 code implementation26 Mar 2024 Qian Shao, Pradeep Varakantham, Shih-Fen Cheng

Generally speaking, imitation learning is designed to learn either the reward (or preference) model or directly the behavioral policy by observing the behavior of an expert.

Imitation Learning reinforcement-learning +2

SPRINQL: Sub-optimal Demonstrations driven Offline Imitation Learning

1 code implementation20 Feb 2024 Huy Hoang, Tien Mai, Pradeep Varakantham

In this paper, we propose an offline IL approach that leverages the larger set of sub-optimal demonstrations while effectively mimicking expert trajectories.

Imitation Learning Q-Learning

Imitate the Good and Avoid the Bad: An Incremental Approach to Safe Reinforcement Learning

1 code implementation16 Dec 2023 Huy Hoang, Tien Mai, Pradeep Varakantham

In an exhaustive set of experiments, we demonstrate that our approach is able to outperform top benchmark approaches for solving Constrained RL problems, with respect to expected cost, CVaR cost, or even unknown cost constraints.

Reinforcement Learning (RL) Safe Reinforcement Learning

Training Reinforcement Learning Agents and Humans With Difficulty-Conditioned Generators

no code implementations4 Dec 2023 Sidney Tio, Jimmy Ho, Pradeep Varakantham

We adapt Parameterized Environment Response Model (PERM), a method for training both Reinforcement Learning (RL) Agents and human learners in parameterized environments by directly modeling difficulty and ability.

reinforcement-learning Reinforcement Learning +1

Generative Modelling of Stochastic Actions with Arbitrary Constraints in Reinforcement Learning

1 code implementation NeurIPS 2023 Changyu Chen, Ramesha Karunasena, Thanh Hong Nguyen, Arunesh Sinha, Pradeep Varakantham

Many problems in Reinforcement Learning (RL) seek an optimal policy with large discrete multidimensional yet unordered action spaces; these include problems in randomized allocation of resources such as placements of multiple security resources and emergency response units, etc.

reinforcement-learning Reinforcement Learning (RL) +1

Enhancing the Hierarchical Environment Design via Generative Trajectory Modeling

no code implementations30 Sep 2023 Dexun Li, Pradeep Varakantham

Unsupervised Environment Design (UED) is a paradigm for automatically generating a curriculum of training environments, enabling agents trained in these environments to develop general capabilities, i. e., achieving good zero-shot transfer performance.

Trajectory Modeling

Transferable Curricula through Difficulty Conditioned Generators

no code implementations22 Jun 2023 Sidney Tio, Pradeep Varakantham

In this paper, we introduce a method named Parameterized Environment Response Model (PERM) that shows promising results in training RL agents in parameterized environments.

Reinforcement Learning (RL) Starcraft +1

Handling Long and Richly Constrained Tasks through Constrained Hierarchical Reinforcement Learning

no code implementations21 Feb 2023 Yuxiao Lu, Arunesh Sinha, Pradeep Varakantham

Safety in goal directed Reinforcement Learning (RL) settings has typically been handled through constraints over trajectories and have demonstrated good performance in primarily short horizon tasks.

Decision Making Hierarchical Reinforcement Learning +3

Future Aware Pricing and Matching for Sustainable On-demand Ride Pooling

no code implementations21 Feb 2023 Xianjie Zhang, Pradeep Varakantham, Hao Jiang

Traditionally, both these challenges have been studied individually and using myopic approaches (considering only current requests), without considering the impact of current matching on addressing future requests.

Regret-Based Defense in Adversarial Reinforcement Learning

1 code implementation14 Feb 2023 Roman Belaire, Pradeep Varakantham, Thanh Nguyen, David Lo

We demonstrate that our approaches provide a significant improvement in performance across a wide variety of benchmarks against leading approaches for robust Deep RL.

Deep Reinforcement Learning reinforcement-learning +1

Diversity Induced Environment Design via Self-Play

no code implementations4 Feb 2023 Dexun Li, Wenjun Li, Pradeep Varakantham

In this paper, we aim to introduce diversity in the Unsupervised Environment Design (UED) framework.

Diversity

Solving Richly Constrained Reinforcement Learning through State Augmentation and Reward Penalties

no code implementations27 Jan 2023 Hao Jiang, Tien Mai, Pradeep Varakantham, Minh Huy Hoang

Constrained Reinforcement Learning has been employed to enforce safety constraints on policy through the use of expected cost constraints.

reinforcement-learning Reinforcement Learning (RL)

Generalization through Diversity: Improving Unsupervised Environment Design

no code implementations19 Jan 2023 Wenjun Li, Pradeep Varakantham, Dexun Li

Agent decision making using Reinforcement Learning (RL) heavily relies on either a model or simulator of the environment (e. g., moving in an 8x8 maze with three rooms, playing Chess on an 8x8 board).

Decision Making Diversity +1

Learning Individual Policies in Large Multi-agent Systems through Local Variance Minimization

no code implementations27 Dec 2022 Tanvi Verma, Pradeep Varakantham

In multi-agent systems with large number of agents, typically the contribution of each agent to the value of other agents is minimal (e. g., aggregation systems such as Uber, Deliveroo).

Multi-agent Reinforcement Learning

Towards Soft Fairness in Restless Multi-Armed Bandits

no code implementations27 Jul 2022 Dexun Li, Pradeep Varakantham

To avoid starvation in the executed interventions across individuals/regions/communities, we first provide a soft fairness constraint and then provide an approach to enforce the soft fairness constraint in RMABs.

Fairness Multi-Armed Bandits

Efficient Resource Allocation with Fairness Constraints in Restless Multi-Armed Bandits

no code implementations8 Jun 2022 Dexun Li, Pradeep Varakantham

In this paper, we are interested in ensuring that RMAB decision making is also fair to different arms while maximizing expected value.

Decision Making Fairness +1

Conditional Expectation based Value Decomposition for Scalable On-Demand Ride Pooling

no code implementations1 Dec 2021 Avinandan Bose, Pradeep Varakantham

Owing to the benefits for customers (lower prices), drivers (higher revenues), aggregation companies (higher revenues) and the environment (fewer vehicles), on-demand ride pooling (e. g., Uber pool, Grab Share) has become quite popular.

Decision Making

Facilitating human-wildlife cohabitation through conflict prediction

no code implementations22 Sep 2021 Susobhan Ghosh, Pradeep Varakantham, Aniket Bhatkhande, Tamanna Ahmad, Anish Andheria, Wenjun Li, Aparna Taneja, Divy Thakkar, Milind Tambe

With increasing world population and expanded use of forests as cohabited regions, interactions and conflicts with wildlife are increasing, leading to large-scale loss of lives (animal and human) and livelihoods (economic).

Prediction

CLAIM: Curriculum Learning Policy for Influence Maximization in Unknown Social Networks

no code implementations8 Jul 2021 Dexun Li, Meghna Lowalekar, Pradeep Varakantham

Influence maximization is the problem of finding a small subset of nodes in a network that can maximize the diffusion of information.

reinforcement-learning Reinforcement Learning +1

Learn to Intervene: An Adaptive Learning Policy for Restless Bandits in Application to Preventive Healthcare

no code implementations17 May 2021 Arpita Biswas, Gaurav Aggarwal, Pradeep Varakantham, Milind Tambe

In many public health settings, it is important for patients to adhere to health programs, such as taking medications and periodic health checks.

Q-Learning

Competitive Ratios for Online Multi-capacity Ridesharing

no code implementations16 Sep 2020 Meghna Lowalekar, Pradeep Varakantham, Patrick Jaillet

The desired matching between resources and request groups is constrained by the edges between requests and request groups in this tripartite graph (i. e., a request can be part of at most one request group in the final assignment).

Zone pAth Construction (ZAC) based Approaches for Effective Real-Time Ridesharing

no code implementations13 Sep 2020 Meghna Lowalekar, Pradeep Varakantham, Patrick Jaillet

This challenge has been addressed in existing work by: (i) generating as many relevant feasible (with respect to the available delay for customers) combinations of requests as possible in real-time; and then (ii) optimizing assignment of the feasible request combinations to vehicles.

Value Variance Minimization for Learning Approximate Equilibrium in Aggregation Systems

no code implementations16 Mar 2020 Tanvi Verma, Pradeep Varakantham

For effective matching of resources (e. g., taxis, food, bikes, shopping items) to customer demand, aggregation systems have been extremely successful.

Multi-agent Reinforcement Learning Reinforcement Learning

On Solving Cooperative MARL Problems with a Few Good Experiences

no code implementations22 Jan 2020 Rajiv Ranjan Kumar, Pradeep Varakantham

Unfortunately, neither of these approaches (or their extensions) are able to address the problem of sparse good experiences effectively.

Descriptive Multi-agent Reinforcement Learning +3

Solving Online Threat Screening Games using Constrained Action Space Reinforcement Learning

no code implementations20 Nov 2019 Sanket Shah, Arunesh Sinha, Pradeep Varakantham, Andrew Perrault, Milind Tambe

To solve the online problem with a hard bound on risk, we formulate it as a Reinforcement Learning (RL) problem with constraints on the action space (hard bound on risk).

Deep Reinforcement Learning reinforcement-learning +1

Neural Approximate Dynamic Programming for On-Demand Ride-Pooling

1 code implementation20 Nov 2019 Sanket Shah, Meghna Lowalekar, Pradeep Varakantham

This is because even a myopic assignment in ride-pooling involves considering what combinations of passenger requests that can be assigned to vehicles, which adds a layer of combinatorial complexity to the ToD problem.

Deep Reinforcement Learning

TuSeRACT: Turn-Sample-Based Real-Time Traffic Signal Control

no code implementations13 Dec 2018 Srishti Dhamija, Pradeep Varakantham

To ensure real-time responsiveness in the presence of turn-induced uncertainty, SURTRAC computes schedules which minimize the delay for the expected turn movements as opposed to minimizing the expected delay under turn-induced uncertainty.

Scheduling Traffic Signal Control +1

Resource Constrained Deep Reinforcement Learning

no code implementations3 Dec 2018 Abhinav Bhatia, Pradeep Varakantham, Akshat Kumar

However, existing Deep RL methods are unable to handle combinatorial action spaces and constraints on allocation of resources.

Deep Reinforcement Learning Management +2

Entropy based Independent Learning in Anonymous Multi-Agent Settings

no code implementations27 Mar 2018 Tanvi Verma, Pradeep Varakantham, Hoong Chuin Lau

A key characteristic of the domains of interest is that the interactions between individuals are anonymous, i. e., the outcome of an interaction (competing for demand) is dependent only on the number and not on the identity of the agents.

Fairness Multi-agent Reinforcement Learning

Regret based Robust Solutions for Uncertain Markov Decision Processes

no code implementations NeurIPS 2013 Asrar Ahmed, Pradeep Varakantham, Yossiri Adulyasak, Patrick Jaillet

Most robust optimization approaches for these problems have focussed on the computation of {\em maximin} policies which maximize the value corresponding to the worst realization of the uncertainty.

Cannot find the paper you are looking for? You can Submit a new open access paper.