no code implementations • 22 May 2024 • Nam Phuong Tran, The Anh Ta, Debmalya Mandal, Long Tran-Thanh

Our algorithm achieves a regret bound of $ O(d_0^{1/3} T^{2/3} \log(d))$, where $d$ is the ambient dimension which is potentially very large, and $d_0$ is the dimension of the true low-dimensional subspace such that $d_0 \ll d$.

no code implementations • 4 Mar 2024 • Andi Nika, Debmalya Mandal, Adish Singla, Goran Radanović

We note that we are the first to provide such a characterization of the problem of learning approximate Nash Equilibrium policies in offline two-player zero-sum Markov games under data corruption.

no code implementations • 4 Mar 2024 • Andi Nika, Debmalya Mandal, Parameswaran Kamalaruban, Georgios Tzannetos, Goran Radanović, Adish Singla

Moreover, we extend our analysis to the approximate optimization setting and derive exponentially decaying convergence rates for both RLHF and DPO.

2 code implementations • 15 Feb 2024 • Ben Rank, Stelios Triantafyllou, Debmalya Mandal, Goran Radanovic

This makes MDRR particularly suitable for scenarios where the environment's response strongly depends on its previous dynamics, which are common in practice.

no code implementations • 10 Feb 2024 • Nam Phuong Tran, The Anh Ta, Shuqing Shi, Debmalya Mandal, Yali Du, Long Tran-Thanh

Within the class of strictly convex games, we present an algorithm named \texttt{Common-Points-Picking} that returns a point in the expected core given a polynomial number of samples, with high probability.

no code implementations • 9 Feb 2024 • Debmalya Mandal, Andi Nika, Parameswaran Kamalaruban, Adish Singla, Goran Radanović

We aim to design algorithms that identify a near-optimal policy from the corrupted data, with provable guarantees.

1 code implementation • 17 Oct 2023 • Stelios Triantafyllou, Aleksa Sukovic, Debmalya Mandal, Goran Radanovic

These challenges are particularly prominent in the context of multi-agent sequential decision-making, where the causal effect of an agent's action on the outcome depends on how other agents respond to that action.

no code implementations • 19 Jul 2023 • Jiarui Gan, Annika Hennes, Rupak Majumdar, Debmalya Mandal, Goran Radanovic

We take a game-theoretic perspective -- whereby each time step is treated as an independent decision maker with their own (fixed) discount factor -- and we study the subgame perfect equilibrium (SPE) of the resulting game as well as the related algorithmic problems.

no code implementations • 6 Jun 2023 • Jiarui Gan, Rupak Majumdar, Debmalya Mandal, Goran Radanovic

In this model, the principal and the agent interact in a stochastic environment, and each is privy to observations about the state not available to the other.

1 code implementation • 27 Feb 2023 • Mohammad Mohammadi, Jonathan Nöther, Debmalya Mandal, Adish Singla, Goran Radanovic

In this paper, we study targeted poisoning attacks in a two-agent setting where an attacker implicitly poisons the effective environment of one of the agents by modifying the policy of its peer.

no code implementations • 7 Feb 2023 • Debmalya Mandal, Goran Radanovic, Jiarui Gan, Adish Singla, Rupak Majumdar

We show that minimizing regret with this new general discounting is equivalent to minimizing regret with uncertain episode lengths.

no code implementations • 26 Aug 2022 • Debmalya Mandal, Jiarui Gan

We consider the problem of minimizing regret with respect to the fair policies maximizing three different fair objectives -- minimum welfare, generalized Gini welfare, and Nash social welfare.

no code implementations • 30 Jun 2022 • Debmalya Mandal, Stelios Triantafyllou, Goran Radanovic

We introduce the framework of performative reinforcement learning where the policy chosen by the learner affects the underlying reward and transition dynamics of the environment.

no code implementations • 18 Jan 2022 • Samuel Deng, Yilin Guo, Daniel Hsu, Debmalya Mandal

Prior works on learning linear representations for meta-learning assume that there is a common shared representation across different tasks, and do not consider the additional task-specific observable side information.

no code implementations • 9 Sep 2021 • Debajyoti Kar, Mert Kosan, Debmalya Mandal, Sourav Medya, Arlei Silva, Palash Dey, Swagato Sanyal

Ensuring fairness in machine learning algorithms is a challenging and essential task.

no code implementations • 19 May 2021 • Hadi Hosseini, Debmalya Mandal, Nisarg Shah, Kevin Shi

A clever recent approach, \emph{surprisingly popular voting}, elicits additional information from the individuals, namely their \emph{prediction} of other individuals' votes, and provably recovers the ground truth even when experts are in minority.

no code implementations • 27 Feb 2021 • Debmalya Mandal, Sourav Medya, Brian Uzzi, Charu Aggarwal

Graph Neural Networks (GNNs), a generalization of deep neural networks on graph data have been widely used in various domains, ranging from drug discovery to recommender systems.

no code implementations • NeurIPS 2020 • Nicholas Bishop, Hau Chan, Debmalya Mandal, Long Tran-Thanh

On the other hand, when B_T is not known, we show that the dynamic approximate regret of RGA-META is at most O((K+\tilde{D})^{1/4}\tilde{B}^{1/2}T^{3/4}) where \tilde{B} is the maximal path variation budget within each batch of RGA-META (which is provably in order of o(\sqrt{T}).

2 code implementations • NeurIPS 2020 • Debmalya Mandal, Samuel Deng, Suman Jana, Jeannette M. Wing, Daniel Hsu

In this work, we develop classifiers that are fair not only with respect to the training distribution, but also for a class of distributions that are weighted perturbations of the training samples.

no code implementations • NeurIPS 2019 • Debmalya Mandal, Ariel D. Procaccia, Nisarg Shah, David Woodruff

We take an unorthodox view of voting by expanding the design space to include both the elicitation rule, whereby voters map their (cardinal) preferences to votes, and the aggregation rule, which transforms the reported votes into collective decisions.

1 code implementation • 12 Feb 2019 • Debmalya Mandal, David Parkes

We model the potential outcomes as a three-dimensional tensor of low rank, where the three dimensions correspond to the agents, time periods and the set of possible histories.

no code implementations • 6 Jul 2017 • Yang Liu, Goran Radanovic, Christos Dimitrakakis, Debmalya Mandal, David C. Parkes

In addition, we define the {\em fairness regret}, which corresponds to the degree to which an algorithm is not calibrated, where perfect calibration requires that the probability of selecting an arm is equal to the probability with which the arm has the best quality realization.

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.