GradientDICE

Introduced by Zhang et al. in GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values

GradientDICE is a density ratio learning method for estimating the density ratio between the state distribution of the target policy and the sampling distribution in off-policy reinforcement learning. It optimizes a different objective from GenDICE by using the Perron-Frobenius theorem and eliminating GenDICE’s use of divergence, such that nonlinearity in parameterization is not necessary for GradientDICE, which is provably convergent under linear function approximation.

Source: GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Density Ratio Learning

GradientDICE

Papers

Usage Over Time

Components

Categories Edit Add Remove

Categories

Add Remove