What are the Statistical Limits of Batch RL with Linear Function Approximation?

ICLR 2021  ·  Ruosong Wang, Dean Foster, Sham M. Kakade ·

Function approximation methods coupled with batch reinforcement learning (or off-policy reinforcement learning) are providing an increasingly important framework to help alleviate the excessive sample complexity burden in modern reinforcement learning problems. However, the extent to which function approximation, when coupled with off-policy data, can be effective is not well understood, where the literature largely consists of \emph{sufficient} conditions. This work focuses on the basic question: what are \emph{necessary} representational and distributional conditions that permit provable sample-efficient off-policy RL? Perhaps surprisingly, our main result shows even if 1) we have \emph{realizability} in that the true value function of our target policy has a linear representation in a given set of features and 2) our off-policy data has good \emph{coverage} over all these features (in a precisely defined and strong sense), any algorithm information-theoretically still requires an exponential number of off-policy samples to non-trivially estimate the value of the target policy. Our results highlight that sample-efficient, batch RL is not guaranteed unless significantly stronger conditions, such as the distribution shift is sufficiently mild (which we precisely characterize) or representation conditions that are far stronger than realizability, are met.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here