Distributed Methods

Herring

Herring is a parameter server based distributed training method. It combines AWS's Elastic Fabric Adapter (EFA) with a novel parameter sharding technique that makes better use of the available network bandwidth. Herring uses EFA and balanced fusion buffer to optimally use the total bandwidth available across all nodes in the cluster. Herring reduces gradients hierarchically, reducing them inside the node first and then reducing across nodes. This enables more efficient use of PCIe bandwidth in the node and helps keep the gradient averaging related burden on GPU low.

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
BIG-bench Machine Learning 1 100.00%

Components


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories