1 code implementation • 22 Jun 2022 • Pan Xu, Hongkai Zheng, Eric Mazumdar, Kamyar Azizzadenesheli, Anima Anandkumar
Existing Thompson sampling-based algorithms need to construct a Laplace approximation (i. e., a Gaussian distribution) of the posterior distribution, which is inefficient to sample in high dimensional applications for general covariance matrices.
no code implementations • 16 Apr 2018 • Eric Mazumdar, Lillian J. Ratliff, S. Shankar Sastry
We formulate a general framework for competitive gradient-based learning that encompasses a wide breadth of multi-agent learning algorithms, and analyze the limiting behavior of competitive gradient-based learning algorithms using dynamical systems theory.
no code implementations • 29 Mar 2017 • Lillian J. Ratliff, Eric Mazumdar
We address the problem of inverse reinforcement learning in Markov decision processes where the agent is risk-sensitive.
no code implementations • 30 May 2019 • Benjamin Chasnov, Lillian J. Ratliff, Eric Mazumdar, Samuel A. Burden
Considering a class of gradient-based multi-agent learning algorithms in non-cooperative settings, we provide local convergence guarantees to a neighborhood of a stable local Nash equilibrium.
no code implementations • 8 Jul 2019 • Eric Mazumdar, Lillian J. Ratliff, Michael. I. Jordan, S. Shankar Sastry
In such games the state and action spaces are continuous and global Nash equilibria can be found be solving coupled Ricatti equations.
no code implementations • 29 Oct 2019 • Tyler Westenbroek, David Fridovich-Keil, Eric Mazumdar, Shreyas Arora, Valmik Prabhu, S. Shankar Sastry, Claire J. Tomlin
We present a novel approach to control design for nonlinear systems which leverages model-free policy optimization techniques to learn a linearizing controller for a physical plant with unknown dynamics.
no code implementations • ICML 2020 • Eric Mazumdar, Aldo Pacchiano, Yi-An Ma, Peter L. Bartlett, Michael. I. Jordan
The resulting approximate Thompson sampling algorithm has logarithmic regret and its computational complexity does not scale with the time horizon of the algorithm.
no code implementations • 6 Apr 2020 • Tyler Westenbroek, Eric Mazumdar, David Fridovich-Keil, Valmik Prabhu, Claire J. Tomlin, S. Shankar Sastry
This paper proposes a framework for adaptively learning a feedback linearization-based tracking controller for an unknown system using discrete-time model-free policy-gradient parameter update rules.
no code implementations • 26 Oct 2020 • Vicenc Rubies-Royo, Eric Mazumdar, Roy Dong, Claire Tomlin, S. Shankar Sastry
In this work we present a multi-armed bandit framework for online expert selection in Markov decision processes and demonstrate its use in high-dimensional settings.
no code implementations • 18 Jul 2017 • Eric Mazumdar, Roy Dong, Vicenç Rúbies Royo, Claire Tomlin, S. Shankar Sastry
We formulate a multi-armed bandit (MAB) approach to choosing expert policies online in Markov decision processes (MDPs).
Systems and Control
no code implementations • 27 Apr 2021 • Yaodong Yu, Tianyi Lin, Eric Mazumdar, Michael I. Jordan
Distributionally robust supervised learning (DRSL) is emerging as a key paradigm for building reliable machine learning systems for real-world applications -- reflecting the need for classifiers and predictive models that are robust to the distribution shifts that arise from phenomena such as selection bias or nonstationarity.
no code implementations • 16 Jun 2021 • Chinmay Maheshwari, Chih-Yuan Chiu, Eric Mazumdar, S. Shankar Sastry, Lillian J. Ratliff
Min-max optimization is emerging as a key framework for analyzing problems of robustness to strategically and adversarially generated data.
no code implementations • NeurIPS 2021 • Tijana Zrnic, Eric Mazumdar, S. Shankar Sastry, Michael I. Jordan
In particular, by generalizing the standard model to allow both players to learn over time, we show that a decision-maker that makes updates faster than the agents can reverse the order of play, meaning that the agents lead and the decision-maker follows.
no code implementations • NeurIPS 2021 • Tanner Fiez, Lillian Ratliff, Eric Mazumdar, Evan Faulkner, Adhyyan Narang
For the class of nonconvex-PL zero-sum games, we exploit timescale separation to construct a potential function that when combined with the stability characterization and an asymptotic saddle avoidance result gives a global asymptotic almost-sure convergence guarantee to a set of the strict local minmax equilibrium.
no code implementations • NeurIPS 2021 • Tanner Fiez, Lillian J Ratliff, Eric Mazumdar, Evan Faulkner, Adhyyan Narang
For the class of nonconvex-PL zero-sum games, we exploit timescale separation to construct a potential function that when combined with the stability characterization and an asymptotic saddle avoidance result gives a global asymptotic almost-sure convergence guarantee to a set of the strict local minmax equilibrium.
no code implementations • 6 Jun 2022 • Chinmay Maheshwari, Eric Mazumdar, Shankar Sastry
We study the problem of online learning in competitive settings in the context of two-sided matching markets.
no code implementations • 2 Aug 2022 • Tijana Zrnic, Eric Mazumdar
The proposed estimator queries the simplex only.
no code implementations • 2 Feb 2023 • Chinmay Maheshwari, James Cheng, S. Shankar Sasty, Lillian Ratliff, Eric Mazumdar
In this paper, we present an efficient algorithm to solve online Stackelberg games, featuring multiple followers, in a follower-agnostic manner.
no code implementations • 8 Feb 2023 • Moritz Hardt, Eric Mazumdar, Celestine Mendler-Dünner, Tijana Zrnic
We initiate a principled study of algorithmic collective action on digital platforms that deploy machine learning algorithms.
no code implementations • 8 Dec 2023 • Zaiwei Chen, Kaiqing Zhang, Eric Mazumdar, Asuman Ozdaglar, Adam Wierman
Specifically, through a change of variable, we show that the update equation of the slow-timescale iterates resembles the classical smoothed best-response dynamics, where the regularized Nash gap serves as a valid Lyapunov function.
no code implementations • 12 Feb 2024 • Tinashe Handina, Eric Mazumdar
We find that strategic interactions can break the conventional view of scaling laws$\unicode{x2013}$meaning that performance does not necessarily monotonically improve as models get larger and/ or more expressive (even with infinite data).