Learning with Good Feature Representations in Bandits and in RL with a Generative Model

ICML 2020 Tor Lattimore, Csaba Szepesvari, Gellert Weisz

The construction by Du et al. (2019) implies that even if a learner is given linear features in $\mathbb R^d$ that approximate the rewards in a bandit with a uniform error of $\epsilon$, then searching for an action that is optimal up to $O(\epsilon)$ requires examining essentially all actions.

Exploration-Enhanced POLITEX

27 Aug 2019 Yasin Abbasi-Yadkori, Nevena Lazic, Csaba Szepesvari, Gellert Weisz

POLITEX has sublinear regret guarantees in uniformly-mixing MDPs when the value estimation error can be controlled, which can be satisfied if all policies sufficiently explore the environment.

LeapsAndBounds: A Method for Approximately Optimal Algorithm Configuration

ICML 2018 Gellert Weisz, Andras Gyorgy, Csaba Szepesvari

We consider the problem of configuring general-purpose solvers to run efficiently on problem instances drawn from an unknown distribution.

