1 code implementation • 10 May 2022 • Ativ Joshi, Abhishek Sinha
In learning theory, the performance of an online policy is commonly measured in terms of the static regret metric, which compares the cumulative loss of an online policy to that of an optimal benchmark in hindsight.