Search Results for author: Aviv Rosenberg

Found 15 papers, 1 papers with code

Policy Optimization for Stochastic Shortest Path

no code implementations7 Feb 2022 Liyu Chen, Haipeng Luo, Aviv Rosenberg

Policy optimization is among the most popular and successful reinforcement learning algorithms, and there is increasing interest in understanding its theoretical guarantees.

reinforcement-learning Reinforcement Learning (RL)

Cooperative Online Learning in Stochastic and Adversarial MDPs

no code implementations31 Jan 2022 Tal Lancewicki, Aviv Rosenberg, Yishay Mansour

We study cooperative online learning in stochastic and adversarial Markov decision process (MDP).

Reinforcement Learning (RL)

Planning and Learning with Adaptive Lookahead

no code implementations28 Jan 2022 Aviv Rosenberg, Assaf Hallak, Shie Mannor, Gal Chechik, Gal Dalal

Some of the most powerful reinforcement learning frameworks use planning for action selection.

Minimax Regret for Stochastic Shortest Path

no code implementations NeurIPS 2021 Alon Cohen, Yonathan Efroni, Yishay Mansour, Aviv Rosenberg

In this work we show that the minimax regret for this setting is $\widetilde O(\sqrt{ (B_\star^2 + B_\star) |S| |A| K})$ where $B_\star$ is a bound on the expected cost of the optimal policy from any state, $S$ is the state space, and $A$ is the action space.

Learning Adversarial Markov Decision Processes with Delayed Feedback

no code implementations29 Dec 2020 Tal Lancewicki, Aviv Rosenberg, Yishay Mansour

We present novel algorithms based on policy optimization that achieve near-optimal high-probability regret of $\widetilde O ( \sqrt{K} + \sqrt{D} )$ under full-information feedback, where $K$ is the number of episodes and $D = \sum_{k} d^k$ is the total delay.

Recommendation Systems

Oracle-Efficient Regret Minimization in Factored MDPs with Unknown Structure

1 code implementation NeurIPS 2021 Aviv Rosenberg, Yishay Mansour

We study regret minimization in non-episodic factored Markov decision processes (FMDPs), where all existing algorithms make the strong assumption that the factored structure of the FMDP is known to the learner in advance.

Stochastic Shortest Path with Adversarially Changing Costs

no code implementations20 Jun 2020 Aviv Rosenberg, Yishay Mansour

Stochastic shortest path (SSP) is a well-known problem in planning and control, in which an agent has to reach a goal state in minimum total expected cost.

Near-optimal Regret Bounds for Stochastic Shortest Path

no code implementations ICML 2020 Alon Cohen, Haim Kaplan, Yishay Mansour, Aviv Rosenberg

In this work we remove this dependence on the minimum cost---we give an algorithm that guarantees a regret bound of $\widetilde{O}(B_\star |S| \sqrt{|A| K})$, where $B_\star$ is an upper bound on the expected cost of the optimal policy, $S$ is the set of states, $A$ is the set of actions and $K$ is the number of episodes.

Reinforcement Learning (RL)

Optimistic Policy Optimization with Bandit Feedback

no code implementations ICML 2020 Yonathan Efroni, Lior Shani, Aviv Rosenberg, Shie Mannor

To the best of our knowledge, the two results are the first sub-linear regret bounds obtained for policy optimization algorithms with unknown transitions and bandit feedback.

Reinforcement Learning (RL)

Online Stochastic Shortest Path with Bandit Feedback and Unknown Transition Function

no code implementations NeurIPS 2019 Aviv Rosenberg, Yishay Mansour

We consider online learning in episodic loop-free Markov decision processes (MDPs), where the loss function can change arbitrarily between episodes.

Online Convex Optimization in Adversarial Markov Decision Processes

no code implementations19 May 2019 Aviv Rosenberg, Yishay Mansour

We consider online learning in episodic loop-free Markov decision processes (MDPs), where the loss function can change arbitrarily between episodes, and the transition function is not known to the learner.

Cannot find the paper you are looking for? You can Submit a new open access paper.