Search Results for author: Debmalya Mandal

Found 21 papers, 4 papers with code

Reward Model Learning vs. Direct Policy Optimization: A Comparative Analysis of Learning from Human Preferences

no code implementations4 Mar 2024 Andi Nika, Debmalya Mandal, Parameswaran Kamalaruban, Georgios Tzannetos, Goran Radanović, Adish Singla

Moreover, we extend our analysis to the approximate optimization setting and derive exponentially decaying convergence rates for both RLHF and DPO.

Corruption-Robust Offline Two-Player Zero-Sum Markov Games

no code implementations4 Mar 2024 Andi Nika, Debmalya Mandal, Adish Singla, Goran Radanović

We note that we are the first to provide such a characterization of the problem of learning approximate Nash Equilibrium policies in offline two-player zero-sum Markov games under data corruption.

Learning the Expected Core of Strictly Convex Stochastic Cooperative Games

no code implementations10 Feb 2024 Nam Phuong Tran, The Anh Ta, Shuqing Shi, Debmalya Mandal, Yali Du, Long Tran-Thanh

Reward allocation, also known as the credit assignment problem, has been an important topic in economics, engineering, and machine learning.

Agent-Specific Effects: A Causal Effect Propagation Analysis in Multi-Agent MDPs

no code implementations17 Oct 2023 Stelios Triantafyllou, Aleksa Sukovic, Debmalya Mandal, Goran Radanovic

These challenges are particularly prominent in the context of multi-agent sequential decision-making, where the causal effect of an agent's action on the outcome depends on how other agents respond to that action.

counterfactual Decision Making +1

Markov Decision Processes with Time-Varying Geometric Discounting

no code implementations19 Jul 2023 Jiarui Gan, Annika Hennes, Rupak Majumdar, Debmalya Mandal, Goran Radanovic

We take a game-theoretic perspective -- whereby each time step is treated as an independent decision maker with their own (fixed) discount factor -- and we study the subgame perfect equilibrium (SPE) of the resulting game as well as the related algorithmic problems.

Sequential Principal-Agent Problems with Communication: Efficient Computation and Learning

no code implementations6 Jun 2023 Jiarui Gan, Rupak Majumdar, Debmalya Mandal, Goran Radanovic

In this model, the principal and the agent interact in a stochastic environment, and each is privy to observations about the state not available to the other.

Decision Making

Implicit Poisoning Attacks in Two-Agent Reinforcement Learning: Adversarial Policies for Training-Time Attacks

1 code implementation27 Feb 2023 Mohammad Mohammadi, Jonathan Nöther, Debmalya Mandal, Adish Singla, Goran Radanovic

In this paper, we study targeted poisoning attacks in a two-agent setting where an attacker implicitly poisons the effective environment of one of the agents by modifying the policy of its peer.

Online Reinforcement Learning with Uncertain Episode Lengths

no code implementations7 Feb 2023 Debmalya Mandal, Goran Radanovic, Jiarui Gan, Adish Singla, Rupak Majumdar

We show that minimizing regret with this new general discounting is equivalent to minimizing regret with uncertain episode lengths.

reinforcement-learning Reinforcement Learning (RL)

Socially Fair Reinforcement Learning

no code implementations26 Aug 2022 Debmalya Mandal, Jiarui Gan

We consider the problem of minimizing regret with respect to the fair policies maximizing three different fair objectives -- minimum welfare, generalized Gini welfare, and Nash social welfare.

reinforcement-learning Reinforcement Learning (RL)

Performative Reinforcement Learning

no code implementations30 Jun 2022 Debmalya Mandal, Stelios Triantafyllou, Goran Radanovic

We introduce the framework of performative reinforcement learning where the policy chosen by the learner affects the underlying reward and transition dynamics of the environment.

reinforcement-learning Reinforcement Learning (RL)

Learning Tensor Representations for Meta-Learning

no code implementations18 Jan 2022 Samuel Deng, Yilin Guo, Daniel Hsu, Debmalya Mandal

Prior works on learning linear representations for meta-learning assume that there is a common shared representation across different tasks, and do not consider the additional task-specific observable side information.

Meta-Learning

Surprisingly Popular Voting Recovers Rankings, Surprisingly!

no code implementations19 May 2021 Hadi Hosseini, Debmalya Mandal, Nisarg Shah, Kevin Shi

A clever recent approach, \emph{surprisingly popular voting}, elicits additional information from the individuals, namely their \emph{prediction} of other individuals' votes, and provably recovers the ground truth even when experts are in minority.

Meta-Learning with Graph Neural Networks: Methods and Applications

no code implementations27 Feb 2021 Debmalya Mandal, Sourav Medya, Brian Uzzi, Charu Aggarwal

Graph Neural Networks (GNNs), a generalization of deep neural networks on graph data have been widely used in various domains, ranging from drug discovery to recommender systems.

Drug Discovery Meta-Learning +1

Adversarial Blocking Bandits

no code implementations NeurIPS 2020 Nicholas Bishop, Hau Chan, Debmalya Mandal, Long Tran-Thanh

On the other hand, when B_T is not known, we show that the dynamic approximate regret of RGA-META is at most O((K+\tilde{D})^{1/4}\tilde{B}^{1/2}T^{3/4}) where \tilde{B} is the maximal path variation budget within each batch of RGA-META (which is provably in order of o(\sqrt{T}).

Blocking

Ensuring Fairness Beyond the Training Data

2 code implementations NeurIPS 2020 Debmalya Mandal, Samuel Deng, Suman Jana, Jeannette M. Wing, Daniel Hsu

In this work, we develop classifiers that are fair not only with respect to the training distribution, but also for a class of distributions that are weighted perturbations of the training samples.

Fairness

Efficient and Thrifty Voting by Any Means Necessary

no code implementations NeurIPS 2019 Debmalya Mandal, Ariel D. Procaccia, Nisarg Shah, David Woodruff

We take an unorthodox view of voting by expanding the design space to include both the elicitation rule, whereby voters map their (cardinal) preferences to votes, and the aggregation rule, which transforms the reported votes into collective decisions.

Weighted Tensor Completion for Time-Series Causal Inference

1 code implementation12 Feb 2019 Debmalya Mandal, David Parkes

We model the potential outcomes as a three-dimensional tensor of low rank, where the three dimensions correspond to the agents, time periods and the set of possible histories.

Causal Inference Time Series +1

Calibrated Fairness in Bandits

no code implementations6 Jul 2017 Yang Liu, Goran Radanovic, Christos Dimitrakakis, Debmalya Mandal, David C. Parkes

In addition, we define the {\em fairness regret}, which corresponds to the degree to which an algorithm is not calibrated, where perfect calibration requires that the probability of selecting an arm is equal to the probability with which the arm has the best quality realization.

Decision Making Fairness +1

Cannot find the paper you are looking for? You can Submit a new open access paper.