no code implementations • 19 Jun 2024 • Yuval Dagan, Constantinos Daskalakis, Maxwell Fishelson, Noah Golowich, Robert Kleinberg, Princewill Okoroafor
We then give an improved \emph{upper bound} for the SPR game, which implies, via our equivalence, a forecasting algorithm with calibration error $O(T^{2/3 - \varepsilon})$ for some $\varepsilon > 0$, improving Foster & Vohra's upper bound for the first time.
no code implementations • 17 Jun 2024 • Noah Golowich, Ankur Moitra
One of the most natural approaches to reinforcement learning (RL) with function approximation is value iteration, which inductively generates approximations to the optimal value function by solving a sequence of regression problems.
no code implementations • 17 Jun 2024 • Constantinos Daskalakis, Noah Golowich
In this paper, we investigate the question of whether the ability to perform ERM, which computes a hypothesis minimizing empirical risk on a given dataset, is necessary for efficient learning: in particular, is there a weaker oracle than ERM which can nevertheless enable learnability?
no code implementations • 17 Jun 2024 • Noah Golowich, Ankur Moitra
Our main structural assumption is that the MDP has low inherent Bellman error, which stipulates that linear value functions have linear Bellman backups with respect to the greedy policy.
no code implementations • 12 Jun 2024 • Fan Chen, Constantinos Daskalakis, Noah Golowich, Alexander Rakhlin
We study computational and statistical aspects of learning Latent Markov Decision Processes (LMDPs).
no code implementations • 4 Jun 2024 • Noah Golowich, Ankur Moitra
We aim for watermarks which satisfy: (a) undetectability, a cryptographic notion introduced by Christ, Gunn & Zamir (2024) which stipulates that it is computationally hard to distinguish watermarked language model outputs from the model's actual output distribution; and (b) robustness to channels which introduce a constant fraction of adversarial insertions, substitutions, and deletions to the watermarked text.
no code implementations • 3 Jun 2024 • Noah Golowich, Elad Hazan, Zhou Lu, Dhruv Rohatgi, Y. Jennifer Sun
The study of population dynamics originated with early sociological works but has since extended into many fields, including biology, epidemiology, evolutionary game theory, and economics.
no code implementations • 4 Apr 2024 • Noah Golowich, Ankur Moitra, Dhruv Rohatgi
We also show that there is no computationally efficient algorithm for reward-directed RL in block MDPs, even when given access to an oracle for this regression problem.
no code implementations • 30 Oct 2023 • Yuval Dagan, Constantinos Daskalakis, Maxwell Fishelson, Noah Golowich
We provide a novel reduction from swap-regret minimization to external-regret minimization, which improves upon the classical reductions of Blum-Mansour [BM07] and Stolz-Lugosi [SL05] in that it does not require finiteness of the space of actions.
no code implementations • 21 Sep 2023 • Constantinos Daskalakis, Noah Golowich, Nika Haghtalab, Abhishek Shetty
We show that both weak and strong $\sigma$-smooth Nash equilibria have superior computational properties to Nash equilibria: when $\sigma$ as well as an approximation parameter $\epsilon$ and the number of players are all constants, there is a constant-time randomized algorithm to find a weak $\epsilon$-approximate $\sigma$-smooth Nash equilibrium in normal-form games.
no code implementations • 18 Sep 2023 • Noah Golowich, Ankur Moitra, Dhruv Rohatgi
The key assumption underlying linear Markov Decision Processes (MDPs) is that the learner has access to a known feature map $\phi(x, a)$ that maps state-action pairs to $d$-dimensional vectors, and that the rewards and transitions are linear functions in this representation.
no code implementations • 1 May 2023 • Dylan J. Foster, Dean P. Foster, Noah Golowich, Alexander Rakhlin
Compared to the best results for the single-agent setting, our bounds have additional gaps.
no code implementations • 22 Mar 2023 • Dylan J. Foster, Noah Golowich, Sham M. Kakade
They are proven via lower bounds for a simpler problem we refer to as SparseCCE, in which the goal is to compute a coarse correlated equilibrium that is sparse in the sense that it can be represented as a mixture of a small number of product policies.
no code implementations • 19 Jan 2023 • Dylan J. Foster, Noah Golowich, Yanjun Han
Recently, Foster et al. (2021) introduced the Decision-Estimation Coefficient (DEC), a measure of statistical complexity which leads to upper and lower bounds on the optimal sample complexity for a general class of problems encompassing bandits and reinforcement learning with function approximation.
no code implementations • 18 Oct 2022 • Constantinos Daskalakis, Noah Golowich, Stratis Skoulakis, Manolis Zampetakis
In particular, our method is not designed to decrease some potential function, such as the distance of its iterate from the set of local min-max equilibria or the projected gradient of the objective, but is designed to satisfy a topological property that guarantees the avoidance of cycles and implies its convergence.
no code implementations • 7 Jun 2022 • Noah Golowich, Ankur Moitra, Dhruv Rohatgi
Much of reinforcement learning theory is built on top of oracles that are computationally hard to implement.
no code implementations • 8 Apr 2022 • Constantinos Daskalakis, Noah Golowich, Kaiqing Zhang
Previous work for learning Markov CCE policies all required exponential time and sample complexity in the number of players.
Multi-agent Reinforcement Learning reinforcement-learning +2
no code implementations • 9 Feb 2022 • Adam Block, Yuval Dagan, Noah Golowich, Alexander Rakhlin
We then prove a lower bound on the oracle complexity of any proper learning algorithm, which matches the oracle-efficient upper bounds up to a polynomial factor, thus demonstrating the existence of a statistical-computational gap in smooth online learning.
no code implementations • 12 Jan 2022 • Noah Golowich, Ankur Moitra, Dhruv Rohatgi
Our main result is a quasipolynomial-time algorithm for planning in (one-step) observable POMDPs.
no code implementations • 24 Nov 2021 • Noah Golowich
Inspired by recent results for the related setting of binary classification (Alon et al., 2019; Bun et al., 2020), where it was shown that online learnability of a binary class is necessary and sufficient for its private learnability, Jung et al. (2020) showed that in the setting of regression, online learnability of $\mathcal{H}$ is necessary for private learnability.
no code implementations • 17 Nov 2021 • Constantinos Daskalakis, Noah Golowich
Our contributions are two-fold: - In the realizable setting of nonparametric online regression with the absolute loss, we propose a randomized proper learning algorithm which gets a near-optimal cumulative loss in terms of the sequential fat-shattering dimension of the hypothesis class.
no code implementations • 11 Nov 2021 • Ioannis Anagnostides, Constantinos Daskalakis, Gabriele Farina, Maxwell Fishelson, Noah Golowich, Tuomas Sandholm
Recently, Daskalakis, Fishelson, and Golowich (DFG) (NeurIPS`21) showed that if all agents in a multi-player general-sum normal-form game employ Optimistic Multiplicative Weights Update (OMWU), the external regret of every player is $O(\textrm{polylog}(T))$ after $T$ repetitions of the game.
no code implementations • 25 Oct 2021 • Noah Golowich, Ankur Moitra
In this paper we address the question of whether worst-case lower bounds for regret in online learning of Markov decision processes (MDPs) can be circumvented when information about the MDP, in the form of predictions about its optimal $Q$-value function, is given to the algorithm.
no code implementations • NeurIPS 2021 • Constantinos Daskalakis, Maxwell Fishelson, Noah Golowich
We show that Optimistic Hedge -- a common variant of multiplicative-weights-updates with recency bias -- attains ${\rm poly}(\log T)$ regret in multi-player general-sum games.
no code implementations • NeurIPS 2021 • Noah Golowich, Roi Livni
Specifically, we show that if the class $\mathcal{H}$ has constant Littlestone dimension then, given an oblivious sequence of labelled examples, there is a private learner that makes in expectation at most $O(\log T)$ mistakes -- comparable to the optimal mistake bound in the non-private case, up to a logarithmic factor.
no code implementations • NeurIPS 2021 • Badih Ghazi, Noah Golowich, Ravi Kumar, Pasin Manurangsi, Chiyuan Zhang
The Randomized Response (RR) algorithm is a classical technique to improve robustness in survey aggregation, and has been widely adopted in applications with differential privacy guarantees.
no code implementations • NeurIPS 2020 • Constantinos Daskalakis, Dylan J. Foster, Noah Golowich
We obtain global, non-asymptotic convergence guarantees for independent learning algorithms in competitive reinforcement learning settings with two agents (i. e., zero-sum stochastic games).
no code implementations • 7 Dec 2020 • Badih Ghazi, Noah Golowich, Ravi Kumar, Pasin Manurangsi
In this paper we prove that the sample complexity of properly learning a class of Littlestone dimension $d$ with approximate differential privacy is $\tilde O(d^6)$, ignoring privacy and accuracy parameters.
no code implementations • NeurIPS 2020 • Noah Golowich, Sarath Pattathil, Constantinos Daskalakis
We also show that the $O(1/\sqrt{T})$ rate is tight for all $p$-SCLI algorithms, which includes OG as a special case.
no code implementations • 7 Jul 2020 • Badih Ghazi, Noah Golowich, Ravi Kumar, Pasin Manurangsi
We study closure properties for the Littlestone and threshold dimensions of binary hypothesis classes.
no code implementations • 31 Jan 2020 • Noah Golowich, Sarath Pattathil, Constantinos Daskalakis, Asuman Ozdaglar
In this paper we study the smooth convex-concave saddle point problem.
no code implementations • 29 Aug 2019 • Badih Ghazi, Noah Golowich, Ravi Kumar, Rasmus Pagh, Ameya Velingker
- Protocols in the multi-message shuffled model with $poly(\log{B}, \log{n})$ bits of communication per user and $poly\log{B}$ error, which provide an exponential improvement on the error compared to what is possible with single-message algorithms.
no code implementations • ICLR 2019 • Sanjeev Arora, Nadav Cohen, Noah Golowich, Wei Hu
We analyze speed of convergence to global optimum for gradient descent training a deep linear neural network (parameterized as $x \mapsto W_N W_{N-1} \cdots W_1 x$) by minimizing the $\ell_2$ loss over whitened data.
no code implementations • 7 Jan 2018 • Chiyuan Zhang, Qianli Liao, Alexander Rakhlin, Brando Miranda, Noah Golowich, Tomaso Poggio
In Theory IIb we characterize with a mix of theory and experiments the optimization of deep convolutional networks by Stochastic Gradient Descent.
no code implementations • 18 Dec 2017 • Noah Golowich, Alexander Rakhlin, Ohad Shamir
We study the sample complexity of learning neural networks, by providing new bounds on their Rademacher complexity assuming norm constraints on the parameter matrix of each layer.