A3C, Asynchronous Advantage Actor Critic, is a policy gradient algorithm in reinforcement learning that maintains a policy $\pi\left(a_{t}\mid{s}_{t}; \theta\right)$ and an estimate of the value function $V\left(s_{t}; \theta_{v}\right)$. It operates in the forward view and uses a mix of $n$step returns to update both the policy and the valuefunction. The policy and the value function are updated after every $t_{\text{max}}$ actions or when a terminal state is reached. The update performed by the algorithm can be seen as $\nabla_{\theta{'}}\log\pi\left(a_{t}\mid{s_{t}}; \theta{'}\right)A\left(s_{t}, a_{t}; \theta, \theta_{v}\right)$ where $A\left(s_{t}, a_{t}; \theta, \theta_{v}\right)$ is an estimate of the advantage function given by:
$$\sum^{k1}_{i=0}\gamma^{i}r_{t+i} + \gamma^{k}V\left(s_{t+k}; \theta_{v}\right)  V\left(s_{t}; \theta_{v}\right)$$
where $k$ can vary from state to state and is upperbounded by $t_{max}$.
The critics in A3C learn the value function while multiple actors are trained in parallel and get synced with global parameters every so often. The gradients are accumulated as part of training for stability  this is like parallelized stochastic gradient descent.
Note that while the parameters $\theta$ of the policy and $\theta_{v}$ of the value function are shown as being separate for generality, we always share some of the parameters in practice. We typically use a convolutional neural network that has one softmax output for the policy $\pi\left(a_{t}\mid{s}_{t}; \theta\right)$ and one linear output for the value function $V\left(s_{t}; \theta_{v}\right)$, with all nonoutput layers shared.
Source: Asynchronous Methods for Deep Reinforcement LearningPaper  Code  Results  Date  Stars 

Task  Papers  Share 

Reinforcement Learning (RL)  38  42.22% 
Atari Games  12  13.33% 
Decision Making  4  4.44% 
Autonomous Driving  3  3.33% 
Continuous Control  2  2.22% 
Multiagent Reinforcement Learning  2  2.22% 
OpenAI Gym  2  2.22% 
Problem Decomposition  2  2.22% 
Thompson Sampling  1  1.11% 
Component  Type 


Convolution

Convolutions  
Dense Connections

Feedforward Networks  
Entropy Regularization

Regularization  
RMSProp

Stochastic Optimization  (optional) 
Softmax

Output Functions 