no code implementations • 23 Nov 2023 • Gilad Yehudai, Alon Cohen, Amit Daniely, Yoel Drori, Tomer Koren, Mariano Schain
We introduce a novel dynamic learning-rate scheduling scheme grounded in theory with the goal of simplifying the manual and time-consuming tuning of schedules in practice.
no code implementations • 28 Aug 2023 • Uri Sherman, Alon Cohen, Tomer Koren, Yishay Mansour
We study regret minimization in online episodic linear Markov Decision Processes, and obtain rate-optimal $\widetilde O (\sqrt K)$ regret where $K$ denotes the number of episodes.
no code implementations • 24 Aug 2023 • Hadar Schreiber Galler, Tom Zahavy, Guillaume Desjardins, Alon Cohen
This problem is formulated as mutual training of skills using an intrinsic reward and a discriminator trained to predict a skill given its trajectory.
no code implementations • 2 Mar 2023 • Orin Levy, Alon Cohen, Asaf Cassel, Yishay Mansour
To the best of our knowledge, our algorithm is the first efficient rate optimal regret minimization algorithm for adversarial CMDPs that operates under the minimal standard assumption of online function approximation.
no code implementations • 27 Nov 2022 • Orin Levy, Asaf Cassel, Alon Cohen, Yishay Mansour
To the best of our knowledge, our algorithm is the first efficient and rate-optimal regret minimization algorithm for CMDPs that operates under the general offline function approximation setting.
no code implementations • 3 Jun 2022 • Asaf Cassel, Alon Cohen, Tomer Koren
We consider the problem of controlling an unknown linear dynamical system under adversarially changing convex costs and full feedback of both the state and cost function.
no code implementations • 2 Mar 2022 • Asaf Cassel, Alon Cohen, Tomer Koren
We consider the problem of controlling an unknown linear dynamical system under a stochastic convex cost and full feedback of both the state and cost function.
no code implementations • NeurIPS 2021 • Alon Cohen, Amit Daniely, Yoel Drori, Tomer Koren, Mariano Schain
In the general non-convex smooth optimization setting, we give a simple and efficient algorithm that requires $O( \sigma^2/\epsilon^4 + \tau/\epsilon^2 )$ steps for finding an $\epsilon$-stationary point $x$, where $\tau$ is the \emph{average} delay $\smash{\frac{1}{T}\sum_{t=1}^T d_t}$ and $\sigma^2$ is the variance of the stochastic gradients.
no code implementations • NeurIPS 2021 • Alon Cohen, Yonathan Efroni, Yishay Mansour, Aviv Rosenberg
In this work we show that the minimax regret for this setting is $\widetilde O(\sqrt{ (B_\star^2 + B_\star) |S| |A| K})$ where $B_\star$ is a bound on the expected cost of the optimal policy from any state, $S$ is the state space, and $A$ is the action space.
no code implementations • 31 Jan 2021 • Alon Cohen, Haim Kaplan, Tomer Koren, Yishay Mansour
We study a novel variant of online finite-horizon Markov Decision Processes with adversarially changing loss functions and initially unknown dynamics.
no code implementations • ICML 2020 • Alon Cohen, Haim Kaplan, Yishay Mansour, Aviv Rosenberg
In this work we remove this dependence on the minimum cost---we give an algorithm that guarantees a regret bound of $\widetilde{O}(B_\star |S| \sqrt{|A| K})$, where $B_\star$ is an upper bound on the expected cost of the optimal policy, $S$ is the set of states, $A$ is the set of actions and $K$ is the number of episodes.
no code implementations • ICML 2020 • Asaf Cassel, Alon Cohen, Tomer Koren
We consider the problem of learning in Linear Quadratic Control systems whose transition parameters are initially unknown.
no code implementations • 5 Nov 2019 • Tom Zahavy, Alon Cohen, Haim Kaplan, Yishay Mansour
Specifically, we show that a variation of the FW method that is based on taking "away steps" achieves a linear rate of convergence when applied to AL and that a stochastic version of the FW algorithm can be used to avoid precise estimation of feature expectations.
no code implementations • 23 May 2019 • Tom Zahavy, Alon Cohen, Haim Kaplan, Yishay Mansour
We derive and analyze learning algorithms for apprenticeship learning, policy evaluation, and policy gradient for average reward criteria.
no code implementations • 17 Feb 2019 • Alon Cohen, Tomer Koren, Yishay Mansour
We present the first computationally-efficient algorithm with $\widetilde O(\sqrt{T})$ regret for learning in Linear Quadratic Control systems with unknown dynamics.
no code implementations • NeurIPS 2019 • Alon Cohen, Avinatan Hassidim, Haim Kaplan, Yishay Mansour, Shay Moran
(ii) In the second variant it is assumed that before the process starts, the algorithm has an access to a training set of $n$ items drawn independently from the same unknown distribution (e. g.\ data of candidates from previous recruitment seasons).
no code implementations • 16 Nov 2018 • Alon Cohen, Moran Koren, Argyrios Deligkas
Furthermore, we show that when there are only two possible outcomes or the agent is risk-neutral, the algorithm's outcome approximates the optimal contract described in the classical theory.
no code implementations • ICML 2018 • Alon Cohen, Avinatan Hassidim, Tomer Koren, Nevena Lazic, Yishay Mansour, Kunal Talwar
We study the problem of controlling linear time-invariant systems with known noisy dynamics and adversarially chosen quadratic losses.
no code implementations • 7 May 2018 • Craig Boutilier, Alon Cohen, Amit Daniely, Avinatan Hassidim, Yishay Mansour, Ofer Meshi, Martin Mladenov, Dale Schuurmans
From an RL perspective, we show that Q-learning with sampled action sets is sound.
no code implementations • 25 Feb 2017 • Alon Cohen, Shie Mannor
We study the problem of prediction with expert advice when the number of experts in question may be extremely large or even infinite.
no code implementations • 24 Feb 2017 • Alon Cohen, Tamir Hazan, Tomer Koren
We revisit the study of optimal regret rates in bandit combinatorial optimization---a fundamental framework for sequential decision making under uncertainty that abstracts numerous combinatorial prediction problems.
no code implementations • 23 May 2016 • Alon Cohen, Tamir Hazan, Tomer Koren
We study an online learning framework introduced by Mannor and Shamir (2011) in which the feedback is specified by a graph, in a setting where the graph may vary from round to round and is \emph{never fully revealed} to the learner.