Search Results for author: Debmalya Mandal

Found 21 papers, 4 papers with code

Reward Model Learning vs. Direct Policy Optimization: A Comparative Analysis of Learning from Human Preferences

no code implementations • 4 Mar 2024 • Andi Nika, Debmalya Mandal, Parameswaran Kamalaruban, Georgios Tzannetos, Goran Radanović, Adish Singla

Moreover, we extend our analysis to the approximate optimization setting and derive exponentially decaying convergence rates for both RLHF and DPO.

Paper
Add Code

Corruption-Robust Offline Two-Player Zero-Sum Markov Games

no code implementations • 4 Mar 2024 • Andi Nika, Debmalya Mandal, Adish Singla, Goran Radanović

We note that we are the first to provide such a characterization of the problem of learning approximate Nash Equilibrium policies in offline two-player zero-sum Markov games under data corruption.

Paper
Add Code

Performative Reinforcement Learning in Gradually Shifting Environments

1 code implementation • 15 Feb 2024 • Ben Rank, Stelios Triantafyllou, Debmalya Mandal, Goran Radanovic

Unlike PRL, our framework allows to model scenarios where the environment gradually adjusts to a deployed policy.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Learning the Expected Core of Strictly Convex Stochastic Cooperative Games

no code implementations • 10 Feb 2024 • Nam Phuong Tran, The Anh Ta, Shuqing Shi, Debmalya Mandal, Yali Du, Long Tran-Thanh

Reward allocation, also known as the credit assignment problem, has been an important topic in economics, engineering, and machine learning.

Paper
Add Code

Corruption Robust Offline Reinforcement Learning with Human Feedback

no code implementations • 9 Feb 2024 • Debmalya Mandal, Andi Nika, Parameswaran Kamalaruban, Adish Singla, Goran Radanović

We aim to design algorithms that identify a near-optimal policy from the corrupted data, with provable guarantees.

Adversarial Attack reinforcement-learning

Paper
Add Code

Agent-Specific Effects: A Causal Effect Propagation Analysis in Multi-Agent MDPs

no code implementations • 17 Oct 2023 • Stelios Triantafyllou, Aleksa Sukovic, Debmalya Mandal, Goran Radanovic

These challenges are particularly prominent in the context of multi-agent sequential decision-making, where the causal effect of an agent's action on the outcome depends on how other agents respond to that action.

counterfactual Decision Making +1

Paper
Add Code

Markov Decision Processes with Time-Varying Geometric Discounting

no code implementations • 19 Jul 2023 • Jiarui Gan, Annika Hennes, Rupak Majumdar, Debmalya Mandal, Goran Radanovic

We take a game-theoretic perspective -- whereby each time step is treated as an independent decision maker with their own (fixed) discount factor -- and we study the subgame perfect equilibrium (SPE) of the resulting game as well as the related algorithmic problems.

Paper
Add Code

Sequential Principal-Agent Problems with Communication: Efficient Computation and Learning

no code implementations • 6 Jun 2023 • Jiarui Gan, Rupak Majumdar, Debmalya Mandal, Goran Radanovic

In this model, the principal and the agent interact in a stochastic environment, and each is privy to observations about the state not available to the other.

Decision Making

Paper
Add Code

Implicit Poisoning Attacks in Two-Agent Reinforcement Learning: Adversarial Policies for Training-Time Attacks

1 code implementation • 27 Feb 2023 • Mohammad Mohammadi, Jonathan Nöther, Debmalya Mandal, Adish Singla, Goran Radanovic

In this paper, we study targeted poisoning attacks in a two-agent setting where an attacker implicitly poisons the effective environment of one of the agents by modifying the policy of its peer.

Paper
Code

Online Reinforcement Learning with Uncertain Episode Lengths

no code implementations • 7 Feb 2023 • Debmalya Mandal, Goran Radanovic, Jiarui Gan, Adish Singla, Rupak Majumdar

We show that minimizing regret with this new general discounting is equivalent to minimizing regret with uncertain episode lengths.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Socially Fair Reinforcement Learning

no code implementations • 26 Aug 2022 • Debmalya Mandal, Jiarui Gan

We consider the problem of minimizing regret with respect to the fair policies maximizing three different fair objectives -- minimum welfare, generalized Gini welfare, and Nash social welfare.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Performative Reinforcement Learning

no code implementations • 30 Jun 2022 • Debmalya Mandal, Stelios Triantafyllou, Goran Radanovic

We introduce the framework of performative reinforcement learning where the policy chosen by the learner affects the underlying reward and transition dynamics of the environment.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Learning Tensor Representations for Meta-Learning

no code implementations • 18 Jan 2022 • Samuel Deng, Yilin Guo, Daniel Hsu, Debmalya Mandal

Prior works on learning linear representations for meta-learning assume that there is a common shared representation across different tasks, and do not consider the additional task-specific observable side information.

Meta-Learning

Paper
Add Code

Feature-based Individual Fairness in k-Clustering

no code implementations • 9 Sep 2021 • Debajyoti Kar, Mert Kosan, Debmalya Mandal, Sourav Medya, Arlei Silva, Palash Dey, Swagato Sanyal

Ensuring fairness in machine learning algorithms is a challenging and essential task.

Clustering Fairness

Paper
Add Code

Surprisingly Popular Voting Recovers Rankings, Surprisingly!

no code implementations • 19 May 2021 • Hadi Hosseini, Debmalya Mandal, Nisarg Shah, Kevin Shi

A clever recent approach, \emph{surprisingly popular voting}, elicits additional information from the individuals, namely their \emph{prediction} of other individuals' votes, and provably recovers the ground truth even when experts are in minority.

Paper
Add Code

Meta-Learning with Graph Neural Networks: Methods and Applications

no code implementations • 27 Feb 2021 • Debmalya Mandal, Sourav Medya, Brian Uzzi, Charu Aggarwal

Graph Neural Networks (GNNs), a generalization of deep neural networks on graph data have been widely used in various domains, ranging from drug discovery to recommender systems.

Drug Discovery Meta-Learning +1

Paper
Add Code

Adversarial Blocking Bandits

no code implementations • NeurIPS 2020 • Nicholas Bishop, Hau Chan, Debmalya Mandal, Long Tran-Thanh

On the other hand, when B_T is not known, we show that the dynamic approximate regret of RGA-META is at most O((K+\tilde{D})^{1/4}\tilde{B}^{1/2}T^{3/4}) where \tilde{B} is the maximal path variation budget within each batch of RGA-META (which is provably in order of o(\sqrt{T}).

Blocking

Paper
Add Code

Ensuring Fairness Beyond the Training Data

2 code implementations • NeurIPS 2020 • Debmalya Mandal, Samuel Deng, Suman Jana, Jeannette M. Wing, Daniel Hsu

In this work, we develop classifiers that are fair not only with respect to the training distribution, but also for a class of distributions that are weighted perturbations of the training samples.

Fairness

Paper
Code

Efficient and Thrifty Voting by Any Means Necessary

no code implementations • NeurIPS 2019 • Debmalya Mandal, Ariel D. Procaccia, Nisarg Shah, David Woodruff

We take an unorthodox view of voting by expanding the design space to include both the elicitation rule, whereby voters map their (cardinal) preferences to votes, and the aggregation rule, which transforms the reported votes into collective decisions.

Paper
Add Code

Weighted Tensor Completion for Time-Series Causal Inference

1 code implementation • 12 Feb 2019 • Debmalya Mandal, David Parkes

We model the potential outcomes as a three-dimensional tensor of low rank, where the three dimensions correspond to the agents, time periods and the set of possible histories.

Causal Inference Time Series +1

Paper
Code

Calibrated Fairness in Bandits

no code implementations • 6 Jul 2017 • Yang Liu, Goran Radanovic, Christos Dimitrakakis, Debmalya Mandal, David C. Parkes

In addition, we define the {\em fairness regret}, which corresponds to the degree to which an algorithm is not calibrated, where perfect calibration requires that the probability of selecting an arm is equal to the probability with which the arm has the best quality realization.

Decision Making Fairness +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.