no code implementations • 26 Oct 2023 • Henry H. H. Chen, Jiaming Lu
The prevailing principle of "Optimism in the Face of Uncertainty" advocates for the incorporation of an exploration bonus, generally assumed to be proportional to the inverse square root of the visit count ($1/\sqrt{n}$), where $n$ is the number of visits to a particular state-action pair.