no code implementations • NeurIPS 2020 • Nived Rajaraman, Lin F. Yang, Jiantao Jiao, Kannan Ramachandran
Here, we show that the policy which mimics the expert whenever possible is in expectation $\lesssim \frac{|\mathcal{S}| H^2 \log (N)}{N}$ suboptimal compared to the value of the expert, even when the expert follows an arbitrary stochastic policy.
no code implementations • 25 Jul 2015 • Vijay Kamble, Nihar Shah, David Marn, Abhay Parekh, Kannan Ramachandran
This paper proposes the Square Root Agreement Rule (SRA): a simple reward mechanism that incentivizes truthful responses to objective evaluations on such platforms.