Search Results for author: Mohammad Sadegh Talebi

Found 6 papers, 0 papers with code

Adversarial Bandits with Corruptions: Regret Lower Bound and No-regret Algorithm

no code implementations NeurIPS 2020 Lin Yang, Mohammad Hajiesmaili, Mohammad Sadegh Talebi, John C. S. Lui, Wing Shing Wong

We characterize the regret of ExpRb as a function of the corruption budget and show that for the case of a known corruption budget, the regret of ExpRb is tight.

Improved Exploration in Factored Average-Reward MDPs

no code implementations9 Sep 2020 Mohammad Sadegh Talebi, Anders Jonsson, Odalric-Ambrym Maillard

We consider a regret minimization task under the average-reward criterion in an unknown Factored Markov Decision Process (FMDP).

Tightening Exploration in Upper Confidence Reinforcement Learning

no code implementations ICML 2020 Hippolyte Bourel, Odalric-Ambrym Maillard, Mohammad Sadegh Talebi

In pursuit of practical efficiency, we present UCRL3, following the lines of UCRL2, but with two key modifications: First, it uses state-of-the-art time-uniform concentration inequalities to compute confidence sets on the reward and (component-wise) transition distributions for each state-action pair.

reinforcement-learning Reinforcement Learning (RL)

Model-Based Reinforcement Learning Exploiting State-Action Equivalence

no code implementations9 Oct 2019 Mahsa Asadi, Mohammad Sadegh Talebi, Hippolyte Bourel, Odalric-Ambrym Maillard

In the case of an unknown equivalence structure, we show through numerical experiments that C-UCRL combined with ApproxEquivalence outperforms UCRL2 in ergodic MDPs.

Model-based Reinforcement Learning reinforcement-learning +1

Learning Multiple Markov Chains via Adaptive Allocation

no code implementations NeurIPS 2019 Mohammad Sadegh Talebi, Odalric-Ambrym Maillard

We study the problem of learning the transition matrices of a set of Markov chains from a single stream of observations on each chain.

Variance-Aware Regret Bounds for Undiscounted Reinforcement Learning in MDPs

no code implementations5 Mar 2018 Mohammad Sadegh Talebi, Odalric-Ambrym Maillard

The problem of reinforcement learning in an unknown and discrete Markov Decision Process (MDP) under the average-reward criterion is considered, when the learner interacts with the system in a single stream of observations, starting from an initial state without any reset.

LEMMA reinforcement-learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.