Search Results for author: Jalaj Bhandari

Found 5 papers, 1 papers with code

Optimizing Long-term Value for Auction-Based Recommender Systems via On-Policy Reinforcement Learning

no code implementations23 May 2023 Ruiyang Xu, Jalaj Bhandari, Dmytro Korenkevych, Fan Liu, Yuchen He, Alex Nikulkov, Zheqing Zhu

Auction-based recommender systems are prevalent in online advertising platforms, but they are typically optimized to allocate recommendation slots based on immediate expected return metrics, neglecting the downstream effects of recommendations on user behavior.

Recommendation Systems reinforcement-learning

On Linear Convergence of Policy Gradient Methods for Finite MDPs

no code implementations21 Jul 2020 Jalaj Bhandari, Daniel Russo

We revisit the finite time analysis of policy gradient methods in the one of the simplest settings: finite state and action MDPs with a policy class consisting of all stochastic policies and with exact gradient evaluations.

Policy Gradient Methods

Global Optimality Guarantees For Policy Gradient Methods

no code implementations5 Jun 2019 Jalaj Bhandari, Daniel Russo

Policy gradients methods apply to complex, poorly understood, control problems by performing stochastic gradient descent over a parameterized class of polices.

Policy Gradient Methods

A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation

no code implementations6 Jun 2018 Jalaj Bhandari, Daniel Russo, Raghav Singal

Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value function corresponding to a given policy in a Markov decision process.

Q-Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.