Data Parallel Methods

GeneralDistributed Methods • 16 methods

This section contains a compilation of distributed data parallel methods for deep learning. For each node we use the same model parameters to do forward propagation, but we send a small batch of different data to each node, compute the gradient normally, and send it back to the main node. Once we have all the gradients, we calculate the weighted average and use this to update the model parameters.

Image credit: Jordi Torres.

Subcategories