A Dueling Network is a type of Q-Network that has two streams to separately estimate (scalar) state-value and the advantages for each action. Both streams share a common convolutional feature learning module. The two streams are combined via a special aggregating layer to produce an estimate of the state-action value function Q as shown in the figure to the right.
The last module uses the following mapping:
$$ Q\left(s, a, \theta, \alpha, \beta\right) =V\left(s, \theta, \beta\right) + \left(A\left(s, a, \theta, \alpha\right) - \frac{1}{|\mathcal{A}|}\sum_{a'}A\left(s, a'; \theta, \alpha\right)\right) $$
This formulation is chosen for identifiability so that the advantage function has zero advantage for the chosen action, but instead of a maximum we use an average operator to increase the stability of the optimization.
Source: Dueling Network Architectures for Deep Reinforcement LearningPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Reinforcement Learning (RL) | 17 | 38.64% |
Atari Games | 7 | 15.91% |
Decision Making | 3 | 6.82% |
Multi-agent Reinforcement Learning | 2 | 4.55% |
OpenAI Gym | 2 | 4.55% |
Starcraft | 2 | 4.55% |
Efficient Exploration | 2 | 4.55% |
Distributional Reinforcement Learning | 1 | 2.27% |
energy management | 1 | 2.27% |
Component | Type |
|
---|---|---|
Convolution
|
Convolutions | |
Dense Connections
|
Feedforward Networks | |
Double Q-learning
|
Off-Policy TD Control |