Best-of-Both-Worlds Algorithms for Partial Monitoring

29 Jul 2022  ·  , , ·

This paper considers the partial monitoring problem with $k$-actions and $d$-outcomes and provides the first best-of-both-worlds algorithms, whose regrets are bounded poly-logarithmically in the stochastic regime and near-optimally in the adversarial regime. To be more specific, we show that for non-degenerate locally observable games, the regret in the stochastic regime is bounded by $O(k^3 m^2 \log(T) \log(k_{\Pi} T) / \Delta_{\mathrm{\min}})$ and in the adversarial regime by $O(k^{2/3} m \sqrt{T \log(T) \log k_{\Pi}})$, where $T$ is the number of rounds, $m$ is the maximum number of distinct observations per action, $\Delta_{\min}$ is the minimum optimality gap, and $k_{\Pi}$ is the number of Pareto optimal actions. Moreover, we show that for non-degenerate globally observable games, the regret in the stochastic regime is bounded by $O(\max\{c_{\mathcal{G}}^2 / k,\, c_{\mathcal{G}}\} \log(T) \log(k_{\Pi} T) / \Delta_{\min}^2)$ and in the adversarial regime by $O((\max\{c_{\mathcal{G}}^2 / k,\, c_{\mathcal{G}}\} \log(T) \log(k_{\Pi} T)))^{1/3} T^{2/3})$, where $c_{\mathcal{G}}$ is a game-dependent constant. Our algorithms are based on the follow-the-regularized-leader framework that takes into account the nature of the partial monitoring problem, inspired by algorithms in the field of online learning with feedback graphs.

PDF Abstract

No code implementations yet. Submit your code now

Datasets

Add Datasets introduced or used in this paper

Results from the Paper Add Remove

Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.