Search Results for author: Tal Lancewicki

Found 7 papers, 0 papers with code

A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial Semi-Bandits, Linear Bandits, and MDPs

no code implementations • 15 May 2023 • Dirk van der Hoeven, Lukas Zierahn, Tal Lancewicki, Aviv Rosenberg, Nicoló Cesa-Bianchi

We derive a new analysis of Follow The Regularized Leader (FTRL) for online learning with delayed bandit feedback.

Paper
Add Code

Delay-Adapted Policy Optimization and Improved Regret for Adversarial MDP with Delayed Bandit Feedback

no code implementations • 13 May 2023 • Tal Lancewicki, Aviv Rosenberg, Dmitry Sotnikov

Policy Optimization (PO) is one of the most popular methods in Reinforcement Learning (RL).

Reinforcement Learning (RL)

Paper
Add Code

Regret Minimization and Convergence to Equilibria in General-sum Markov Games

no code implementations • 28 Jul 2022 • Liad Erez, Tal Lancewicki, Uri Sherman, Tomer Koren, Yishay Mansour

Our key observation is that online learning via policy optimization in Markov games essentially reduces to a form of weighted regret minimization, with unknown weights determined by the path length of the agents' policy sequence.

Paper
Add Code

Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback

no code implementations • 31 Jan 2022 • Tiancheng Jin, Tal Lancewicki, Haipeng Luo, Yishay Mansour, Aviv Rosenberg

The standard assumption in reinforcement learning (RL) is that agents observe feedback for their actions immediately.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Cooperative Online Learning in Stochastic and Adversarial MDPs

no code implementations • 31 Jan 2022 • Tal Lancewicki, Aviv Rosenberg, Yishay Mansour

We study cooperative online learning in stochastic and adversarial Markov decision process (MDP).

Reinforcement Learning (RL)

Paper
Add Code

Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions

no code implementations • 4 Jun 2021 • Tal Lancewicki, Shahar Segal, Tomer Koren, Yishay Mansour

We study the stochastic Multi-Armed Bandit (MAB) problem with random delays in the feedback received by the algorithm.

Multi-Armed Bandits

Paper
Add Code

Learning Adversarial Markov Decision Processes with Delayed Feedback

no code implementations • 29 Dec 2020 • Tal Lancewicki, Aviv Rosenberg, Yishay Mansour

We present novel algorithms based on policy optimization that achieve near-optimal high-probability regret of $\widetilde O ( \sqrt{K} + \sqrt{D} )$ under full-information feedback, where $K$ is the number of episodes and $D = \sum_{k} d^k$ is the total delay.

Recommendation Systems

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.