Communication Lower Bound in Convolution Accelerators

8 Nov 2019  ·  Xiaoming Chen, Yinhe Han, Yu Wang ·

In current convolutional neural network (CNN) accelerators, communication (i.e., memory access) dominates the energy consumption. This work provides comprehensive analysis and methodologies to minimize the communication for CNN accelerators. For the off-chip communication, we derive the theoretical lower bound for any convolutional layer and propose a dataflow to reach the lower bound. This fundamental problem has never been solved by prior studies. The on-chip communication is minimized based on an elaborate workload and storage mapping scheme. We in addition design a communication-optimal CNN accelerator architecture. Evaluations based on the 65nm technology demonstrate that the proposed architecture nearly reaches the theoretical minimum communication in a three-level memory hierarchy and it is computation dominant. The gap between the energy efficiency of our accelerator and the theoretical best value is only 37-87%.

PDF Abstract
No code implementations yet. Submit your code now

Categories


Distributed, Parallel, and Cluster Computing Hardware Architecture

Datasets


  Add Datasets introduced or used in this paper