Search Results for author: Shivaram Kalyanakrishnan

Found 5 papers, 1 papers with code

An Analysis of Frame-skipping in Reinforcement Learning

no code implementations7 Feb 2021 Shivaram Kalyanakrishnan, Siddharth Aravindan, Vishwajeet Bagdawat, Varun Bhatt, Harshith Goka, Archit Gupta, Kalpesh Krishna, Vihari Piratla

In this paper, we investigate the role of the parameter $d$ in RL; $d$ is called the "frame-skip" parameter, since states in the Atari domain are images.

Decision Making Frame +1

Lower Bounds for Policy Iteration on Multi-action MDPs

no code implementations16 Sep 2020 Kumar Ashutosh, Sarthak Consul, Bhishma Dedhia, Parthasarathi Khirwadkar, Sahil Shah, Shivaram Kalyanakrishnan

An important theoretical question is how many iterations a specified PI variant will take to terminate as a function of the number of states $n$ and the number of actions $k$ in the input MDP.

PAC Identification of Many Good Arms in Stochastic Multi-Armed Bandits

no code implementations24 Jan 2019 Arghya Roy Chaudhuri, Shivaram Kalyanakrishnan

The problem of identifying $k > 1$ distinct arms from the best $\rho$ fraction is not always well-defined; for a special class of this problem, we present lower and upper bounds.

Multi-Armed Bandits

Regret Minimisation in Multi-Armed Bandits Using Bounded Arm Memory

no code implementations24 Jan 2019 Arghya Roy Chaudhuri, Shivaram Kalyanakrishnan

We present a conceptually simple, and efficient algorithm that needs to remember statistics of at most $M$ arms, and for any $K$-armed finite bandit instance it enjoys a $O(KM +K^{1. 5}\sqrt{T\log (T/MK)}/M)$ upper-bound on regret.

Multi-Armed Bandits

Cannot find the paper you are looking for? You can Submit a new open access paper.