no code implementations • 22 Jan 2024 • Matan Schliserman, Uri Sherman, Tomer Koren
Our bound translates to a lower bound of $\Omega (\sqrt{d})$ on the number of training examples required for standard GD to reach a non-trivial test error, answering an open question raised by Feldman (2016) and Amir, Koren, and Livni (2021b) and showing that a non-trivial dimension dependence is unavoidable.
no code implementations • 28 Aug 2023 • Uri Sherman, Alon Cohen, Tomer Koren, Yishay Mansour
We study regret minimization in online episodic linear Markov Decision Processes, and obtain rate-optimal $\widetilde O (\sqrt K)$ regret where $K$ denotes the number of episodes.
no code implementations • 30 Jan 2023 • Uri Sherman, Tomer Koren, Yishay Mansour
We study reinforcement learning with linear function approximation and adversarially changing cost functions, a setup that has mostly been considered under simplifying assumptions such as full information feedback or exploratory conditions. We present a computationally efficient policy optimization algorithm for the challenging general setting of unknown dynamics and bandit feedback, featuring a combination of mirror-descent and least squares policy evaluation in an auxiliary MDP used to compute exploration bonuses. Our algorithm obtains an $\widetilde O(K^{6/7})$ regret bound, improving significantly over previous state-of-the-art of $\widetilde O (K^{14/15})$ in this setting.
no code implementations • 28 Jul 2022 • Liad Erez, Tal Lancewicki, Uri Sherman, Tomer Koren, Yishay Mansour
Our key observation is that online learning via policy optimization in Markov games essentially reduces to a form of weighted regret minimization, with unknown weights determined by the path length of the agents' policy sequence.
no code implementations • 27 Feb 2022 • Tomer Koren, Roi Livni, Yishay Mansour, Uri Sherman
We study to what extent may stochastic gradient descent (SGD) be understood as a "conventional" learning rule that achieves generalization performance by obtaining a good fit to training data.
no code implementations • NeurIPS 2021 • Uri Sherman, Tomer Koren, Yishay Mansour
We study online convex optimization in the random order model, recently proposed by \citet{garber2020online}, where the loss functions may be chosen by an adversary, but are then presented to the online algorithm in a uniformly random order.
no code implementations • 7 Feb 2021 • Uri Sherman, Tomer Koren
We study a variant of online convex optimization where the player is permitted to switch decisions at most $S$ times in expectation throughout $T$ rounds.