Search Results for author: Jalaj Bhandari

Found 5 papers, 1 papers with code

Pearl: A Production-ready Reinforcement Learning Agent

1 code implementation • 6 Dec 2023 • Zheqing Zhu, Rodrigo de Salvo Braz, Jalaj Bhandari, Daniel Jiang, Yi Wan, Yonathan Efroni, Liyuan Wang, Ruiyang Xu, Hongbo Guo, Alex Nikulkov, Dmytro Korenkevych, Urun Dogan, Frank Cheng, Zheng Wu, Wanqiao Xu

Reinforcement Learning (RL) offers a versatile framework for achieving long-term goals.

reinforcement-learning Reinforcement Learning (RL)

2,366

Paper
Code

Optimizing Long-term Value for Auction-Based Recommender Systems via On-Policy Reinforcement Learning

no code implementations • 23 May 2023 • Ruiyang Xu, Jalaj Bhandari, Dmytro Korenkevych, Fan Liu, Yuchen He, Alex Nikulkov, Zheqing Zhu

Auction-based recommender systems are prevalent in online advertising platforms, but they are typically optimized to allocate recommendation slots based on immediate expected return metrics, neglecting the downstream effects of recommendations on user behavior.

Recommendation Systems reinforcement-learning

Paper
Add Code

On Linear Convergence of Policy Gradient Methods for Finite MDPs

no code implementations • 21 Jul 2020 • Jalaj Bhandari, Daniel Russo

We revisit the finite time analysis of policy gradient methods in the one of the simplest settings: finite state and action MDPs with a policy class consisting of all stochastic policies and with exact gradient evaluations.

Policy Gradient Methods

Paper
Add Code

Global Optimality Guarantees For Policy Gradient Methods

no code implementations • 5 Jun 2019 • Jalaj Bhandari, Daniel Russo

Policy gradients methods apply to complex, poorly understood, control problems by performing stochastic gradient descent over a parameterized class of polices.

Policy Gradient Methods

Paper
Add Code

A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation

no code implementations • 6 Jun 2018 • Jalaj Bhandari, Daniel Russo, Raghav Singal

Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value function corresponding to a given policy in a Markov decision process.

Q-Learning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.