no code implementations • 19 Jun 2024 • Yuval Dagan, Constantinos Daskalakis, Maxwell Fishelson, Noah Golowich, Robert Kleinberg, Princewill Okoroafor

We then give an improved \emph{upper bound} for the SPR game, which implies, via our equivalence, a forecasting algorithm with calibration error $O(T^{2/3 - \varepsilon})$ for some $\varepsilon > 0$, improving Foster & Vohra's upper bound for the first time.

no code implementations • 17 Jun 2024 • Noah Golowich, Ankur Moitra

One of the most natural approaches to reinforcement learning (RL) with function approximation is value iteration, which inductively generates approximations to the optimal value function by solving a sequence of regression problems.

no code implementations • 17 Jun 2024 • Constantinos Daskalakis, Noah Golowich

In this paper, we investigate the question of whether the ability to perform ERM, which computes a hypothesis minimizing empirical risk on a given dataset, is necessary for efficient learning: in particular, is there a weaker oracle than ERM which can nevertheless enable learnability?

no code implementations • 17 Jun 2024 • Noah Golowich, Ankur Moitra

Our main structural assumption is that the MDP has low inherent Bellman error, which stipulates that linear value functions have linear Bellman backups with respect to the greedy policy.

no code implementations • 12 Jun 2024 • Fan Chen, Constantinos Daskalakis, Noah Golowich, Alexander Rakhlin

We study computational and statistical aspects of learning Latent Markov Decision Processes (LMDPs).

no code implementations • 4 Jun 2024 • Noah Golowich, Ankur Moitra

We aim for watermarks which satisfy: (a) undetectability, a cryptographic notion introduced by Christ, Gunn & Zamir (2024) which stipulates that it is computationally hard to distinguish watermarked language model outputs from the model's actual output distribution; and (b) robustness to channels which introduce a constant fraction of adversarial insertions, substitutions, and deletions to the watermarked text.

no code implementations • 3 Jun 2024 • Noah Golowich, Elad Hazan, Zhou Lu, Dhruv Rohatgi, Y. Jennifer Sun

The study of population dynamics originated with early sociological works but has since extended into many fields, including biology, epidemiology, evolutionary game theory, and economics.

no code implementations • 4 Apr 2024 • Noah Golowich, Ankur Moitra, Dhruv Rohatgi

We also show that there is no computationally efficient algorithm for reward-directed RL in block MDPs, even when given access to an oracle for this regression problem.

no code implementations • 30 Oct 2023 • Yuval Dagan, Constantinos Daskalakis, Maxwell Fishelson, Noah Golowich

We provide a novel reduction from swap-regret minimization to external-regret minimization, which improves upon the classical reductions of Blum-Mansour [BM07] and Stolz-Lugosi [SL05] in that it does not require finiteness of the space of actions.

no code implementations • 21 Sep 2023 • Constantinos Daskalakis, Noah Golowich, Nika Haghtalab, Abhishek Shetty

We show that both weak and strong $\sigma$-smooth Nash equilibria have superior computational properties to Nash equilibria: when $\sigma$ as well as an approximation parameter $\epsilon$ and the number of players are all constants, there is a constant-time randomized algorithm to find a weak $\epsilon$-approximate $\sigma$-smooth Nash equilibrium in normal-form games.

no code implementations • 18 Sep 2023 • Noah Golowich, Ankur Moitra, Dhruv Rohatgi

The key assumption underlying linear Markov Decision Processes (MDPs) is that the learner has access to a known feature map $\phi(x, a)$ that maps state-action pairs to $d$-dimensional vectors, and that the rewards and transitions are linear functions in this representation.

no code implementations • 1 May 2023 • Dylan J. Foster, Dean P. Foster, Noah Golowich, Alexander Rakhlin

Compared to the best results for the single-agent setting, our bounds have additional gaps.

no code implementations • 22 Mar 2023 • Dylan J. Foster, Noah Golowich, Sham M. Kakade

They are proven via lower bounds for a simpler problem we refer to as SparseCCE, in which the goal is to compute a coarse correlated equilibrium that is sparse in the sense that it can be represented as a mixture of a small number of product policies.

no code implementations • 19 Jan 2023 • Dylan J. Foster, Noah Golowich, Yanjun Han

Recently, Foster et al. (2021) introduced the Decision-Estimation Coefficient (DEC), a measure of statistical complexity which leads to upper and lower bounds on the optimal sample complexity for a general class of problems encompassing bandits and reinforcement learning with function approximation.

no code implementations • 18 Oct 2022 • Constantinos Daskalakis, Noah Golowich, Stratis Skoulakis, Manolis Zampetakis

In particular, our method is not designed to decrease some potential function, such as the distance of its iterate from the set of local min-max equilibria or the projected gradient of the objective, but is designed to satisfy a topological property that guarantees the avoidance of cycles and implies its convergence.

no code implementations • 7 Jun 2022 • Noah Golowich, Ankur Moitra, Dhruv Rohatgi

Much of reinforcement learning theory is built on top of oracles that are computationally hard to implement.

no code implementations • 8 Apr 2022 • Constantinos Daskalakis, Noah Golowich, Kaiqing Zhang

Previous work for learning Markov CCE policies all required exponential time and sample complexity in the number of players.

Multi-agent Reinforcement Learning
reinforcement-learning
**+2**

no code implementations • 9 Feb 2022 • Adam Block, Yuval Dagan, Noah Golowich, Alexander Rakhlin

We then prove a lower bound on the oracle complexity of any proper learning algorithm, which matches the oracle-efficient upper bounds up to a polynomial factor, thus demonstrating the existence of a statistical-computational gap in smooth online learning.

no code implementations • 12 Jan 2022 • Noah Golowich, Ankur Moitra, Dhruv Rohatgi

Our main result is a quasipolynomial-time algorithm for planning in (one-step) observable POMDPs.

no code implementations • 24 Nov 2021 • Noah Golowich

Inspired by recent results for the related setting of binary classification (Alon et al., 2019; Bun et al., 2020), where it was shown that online learnability of a binary class is necessary and sufficient for its private learnability, Jung et al. (2020) showed that in the setting of regression, online learnability of $\mathcal{H}$ is necessary for private learnability.

no code implementations • 17 Nov 2021 • Constantinos Daskalakis, Noah Golowich

Our contributions are two-fold: - In the realizable setting of nonparametric online regression with the absolute loss, we propose a randomized proper learning algorithm which gets a near-optimal cumulative loss in terms of the sequential fat-shattering dimension of the hypothesis class.

no code implementations • 11 Nov 2021 • Ioannis Anagnostides, Constantinos Daskalakis, Gabriele Farina, Maxwell Fishelson, Noah Golowich, Tuomas Sandholm

Recently, Daskalakis, Fishelson, and Golowich (DFG) (NeurIPS`21) showed that if all agents in a multi-player general-sum normal-form game employ Optimistic Multiplicative Weights Update (OMWU), the external regret of every player is $O(\textrm{polylog}(T))$ after $T$ repetitions of the game.

no code implementations • 25 Oct 2021 • Noah Golowich, Ankur Moitra

In this paper we address the question of whether worst-case lower bounds for regret in online learning of Markov decision processes (MDPs) can be circumvented when information about the MDP, in the form of predictions about its optimal $Q$-value function, is given to the algorithm.

no code implementations • NeurIPS 2021 • Constantinos Daskalakis, Maxwell Fishelson, Noah Golowich

We show that Optimistic Hedge -- a common variant of multiplicative-weights-updates with recency bias -- attains ${\rm poly}(\log T)$ regret in multi-player general-sum games.

no code implementations • NeurIPS 2021 • Noah Golowich, Roi Livni

Specifically, we show that if the class $\mathcal{H}$ has constant Littlestone dimension then, given an oblivious sequence of labelled examples, there is a private learner that makes in expectation at most $O(\log T)$ mistakes -- comparable to the optimal mistake bound in the non-private case, up to a logarithmic factor.

no code implementations • NeurIPS 2021 • Badih Ghazi, Noah Golowich, Ravi Kumar, Pasin Manurangsi, Chiyuan Zhang

The Randomized Response (RR) algorithm is a classical technique to improve robustness in survey aggregation, and has been widely adopted in applications with differential privacy guarantees.

no code implementations • NeurIPS 2020 • Constantinos Daskalakis, Dylan J. Foster, Noah Golowich

We obtain global, non-asymptotic convergence guarantees for independent learning algorithms in competitive reinforcement learning settings with two agents (i. e., zero-sum stochastic games).

no code implementations • 7 Dec 2020 • Badih Ghazi, Noah Golowich, Ravi Kumar, Pasin Manurangsi

In this paper we prove that the sample complexity of properly learning a class of Littlestone dimension $d$ with approximate differential privacy is $\tilde O(d^6)$, ignoring privacy and accuracy parameters.

no code implementations • NeurIPS 2020 • Noah Golowich, Sarath Pattathil, Constantinos Daskalakis

We also show that the $O(1/\sqrt{T})$ rate is tight for all $p$-SCLI algorithms, which includes OG as a special case.

no code implementations • 7 Jul 2020 • Badih Ghazi, Noah Golowich, Ravi Kumar, Pasin Manurangsi

We study closure properties for the Littlestone and threshold dimensions of binary hypothesis classes.

no code implementations • 31 Jan 2020 • Noah Golowich, Sarath Pattathil, Constantinos Daskalakis, Asuman Ozdaglar

In this paper we study the smooth convex-concave saddle point problem.

no code implementations • 29 Aug 2019 • Badih Ghazi, Noah Golowich, Ravi Kumar, Rasmus Pagh, Ameya Velingker

- Protocols in the multi-message shuffled model with $poly(\log{B}, \log{n})$ bits of communication per user and $poly\log{B}$ error, which provide an exponential improvement on the error compared to what is possible with single-message algorithms.

no code implementations • ICLR 2019 • Sanjeev Arora, Nadav Cohen, Noah Golowich, Wei Hu

We analyze speed of convergence to global optimum for gradient descent training a deep linear neural network (parameterized as $x \mapsto W_N W_{N-1} \cdots W_1 x$) by minimizing the $\ell_2$ loss over whitened data.

no code implementations • 7 Jan 2018 • Chiyuan Zhang, Qianli Liao, Alexander Rakhlin, Brando Miranda, Noah Golowich, Tomaso Poggio

In Theory IIb we characterize with a mix of theory and experiments the optimization of deep convolutional networks by Stochastic Gradient Descent.

no code implementations • 18 Dec 2017 • Noah Golowich, Alexander Rakhlin, Ohad Shamir

We study the sample complexity of learning neural networks, by providing new bounds on their Rademacher complexity assuming norm constraints on the parameter matrix of each layer.

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.