On the Optimal Recovery Threshold of Coded Matrix Multiplication
We provide novel coded computation strategies for distributed matrix-matrix products that outperform the recent "Polynomial code" constructions in recovery threshold, i.e., the required number of successful workers. When $m$-th fraction of each matrix can be stored in each worker node, Polynomial codes require $m^2$ successful workers, while our MatDot codes only require $2m-1$ successful workers, albeit at a higher communication cost from each worker to the fusion node. We also provide a systematic construction of MatDot codes. Further, we propose "PolyDot" coding that interpolates between Polynomial codes and MatDot codes to trade off communication cost and recovery threshold. Finally, we demonstrate a coding technique for multiplying $n$ matrices ($n \geq 3$) by applying MatDot and PolyDot coding ideas.
PDF Abstract