no code implementations • 28 Mar 2023 • Zaiwei Chen, Siva Theja Maguluri, Martin Zubeldia
To demonstrate the applicability of our theoretical results, we use them to provide maximal concentration bounds for a large class of reinforcement learning algorithms, including but not limited to on-policy TD-learning with linear function approximation, off-policy TD-learning with generalized importance sampling factors, and $Q$-learning.
no code implementations • 22 Feb 2018 • Lili Su, Martin Zubeldia, Nancy Lynch
We say an individual learns the best option if eventually (as $t \to \infty$) it pulls only the arm with the highest average reward.