no code implementations • 27 Sep 2023 • Germano Gabbianelli, Gergely Neu, Matteo Papini
These improvements are made possible by the observation that the upper and lower tails importance-weighted estimators behave very differently from each other, and their careful control can massively improve on previous results that were all based on symmetric two-sided concentration inequalities.
no code implementations • 22 May 2023 • Germano Gabbianelli, Gergely Neu, Nneka Okolo, Matteo Papini
Offline Reinforcement Learning (RL) aims to learn a near-optimal policy from a fixed dataset of transitions collected by another policy.
no code implementations • 18 Jul 2022 • Germano Gabbianelli, Matteo Papini, Gergely Neu
We study the problem of online learning in adversarial bandit problems under a partial observability model called off-policy feedback.