Target Policy Smoothing is a regularization strategy for the value function in reinforcement learning. Deterministic policies can overfit to narrow peaks in the value estimate, making them highly susceptible to functional approximation error, increasing the variance of the target. To reduce this variance, target policy smoothing adds a small amount of random noise to the target policy and averages over minibatches  approximating a SARSAlike expectation/integral.
The modified target update is:
$$ y = r + \gamma{Q}_{\theta'}\left(s', \pi_{\theta'}\left(s'\right) + \epsilon \right) $$
$$ \epsilon \sim \text{clip}\left(\mathcal{N}\left(0, \sigma\right), c, c \right) $$
where the added noise is clipped to keep the target close to the original action. The outcome is an algorithm reminiscent of Expected SARSA, where the value estimate is instead learned offpolicy and the noise added to the target policy is chosen independently of the exploration policy. The value estimate learned is with respect to a noisy policy defined by the parameter $\sigma$.
Source: Addressing Function Approximation Error in ActorCritic MethodsPaper  Code  Results  Date  Stars 

Task  Papers  Share 

Continuous Control  19  39.58% 
OpenAI Gym  6  12.50% 
Autonomous Driving  4  8.33% 
Decision Making  4  8.33% 
MetaLearning  3  6.25% 
Atari Games  2  4.17% 
energy management  1  2.08% 
Imitation Learning  1  2.08% 
Feature Engineering  1  2.08% 
Component  Type 


🤖 No Components Found  You can add them if they exist; e.g. Mask RCNN uses RoIAlign 