1 code implementation • 10 Sep 2021 • Shubham Anand Jain, Rohan Shah, Sanit Gupta, Denil Mehta, Inderjeet Jayakumar Nair, Jian Vora, Sushil Khyalia, Sourav Das, Vinay J. Ribeiro, Shivaram Kalyanakrishnan
This problem reduces to the estimation of a single parameter when $\mathcal{P}$ has a support set of size $K = 2$.
no code implementations • 7 Feb 2021 • Shivaram Kalyanakrishnan, Siddharth Aravindan, Vishwajeet Bagdawat, Varun Bhatt, Harshith Goka, Archit Gupta, Kalpesh Krishna, Vihari Piratla
In this paper, we investigate the role of the parameter $d$ in RL; $d$ is called the "frame-skip" parameter, since states in the Atari domain are images.
no code implementations • 16 Sep 2020 • Kumar Ashutosh, Sarthak Consul, Bhishma Dedhia, Parthasarathi Khirwadkar, Sahil Shah, Shivaram Kalyanakrishnan
An important theoretical question is how many iterations a specified PI variant will take to terminate as a function of the number of states $n$ and the number of actions $k$ in the input MDP.
no code implementations • 24 Jan 2019 • Arghya Roy Chaudhuri, Shivaram Kalyanakrishnan
The problem of identifying $k > 1$ distinct arms from the best $\rho$ fraction is not always well-defined; for a special class of this problem, we present lower and upper bounds.
no code implementations • 24 Jan 2019 • Arghya Roy Chaudhuri, Shivaram Kalyanakrishnan
We present a conceptually simple, and efficient algorithm that needs to remember statistics of at most $M$ arms, and for any $K$-armed finite bandit instance it enjoys a $O(KM +K^{1. 5}\sqrt{T\log (T/MK)}/M)$ upper-bound on regret.