Distributed Methods

Blink Communication

Introduced by Wang et al. in Blink: Fast and Generic Collectives for Distributed ML

Blink is a communication library for inter-GPU parameter exchange that achieves near-optimal link utilization. To handle topology heterogeneity from hardware generations or partial allocations from cluster schedulers, Blink dynamically generates optimal communication primitives for a given topology. Blink probes the set of links available for a given job at runtime and builds a topology with appropriate link capacities. Given the topology, Blink achieves the optimal communication rate by packing spanning trees, that can utilize more links (Lovasz, 1976; Edmonds, 1973) when compared to rings. The authors use a multiplicative-weight update based approximation algorithm to quickly compute the maximal packing and extend the algorithm to further minimize the number of trees generated. Blink’s collectives extend across multiple machines effectively utilizing all available network interfaces.

Source: Blink: Fast and Generic Collectives for Distributed ML

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
Image Classification 1 100.00%

Components


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories