Retrace is an offpolicy Qvalue estimation algorithm which has guaranteed convergence for a target and behaviour policy $\left(\pi, \beta\right)$. With offpolicy rollout for TD learning, we must use importance sampling for the update:
$$ \Delta{Q}^{\text{imp}}\left(S_{t}, A_{t}\right) = \gamma^{t}\prod_{1\leq{\tau}\leq{t}}\frac{\pi\left(A_{\tau}\mid{S_{\tau}}\right)}{\beta\left(A_{\tau}\mid{S_{\tau}}\right)}\delta_{t} $$
This product term can lead to high variance, so Retrace modifies $\Delta{Q}$ to have importance weights truncated by no more than a constant $c$:
$$ \Delta{Q}^{\text{imp}}\left(S_{t}, A_{t}\right) = \gamma^{t}\prod_{1\leq{\tau}\leq{t}}\min\left(c, \frac{\pi\left(A_{\tau}\mid{S_{\tau}}\right)}{\beta\left(A_{\tau}\mid{S_{\tau}}\right)}\right)\delta_{t} $$
Source: Safe and Efficient OffPolicy Reinforcement LearningPaper  Code  Results  Date  Stars 

Task  Papers  Share 

Problem Decomposition  2  13.33% 
General Classification  2  13.33% 
Atari Games  2  13.33% 
Automatic Speech Recognition  1  6.67% 
Speech Recognition  1  6.67% 
Face AntiSpoofing  1  6.67% 
Face Recognition  1  6.67% 
Time Series  1  6.67% 
Time Series Classification  1  6.67% 
Component  Type 


🤖 No Components Found  You can add them if they exist; e.g. Mask RCNN uses RoIAlign 