On the cost of homogeneous network building blocks and parameter sharing

1 Jan 2021 · Thomas Pfeil ·

Deep neural networks have usually to be compressed and accelerated for their usage in low-power, e.g. mobile, devices. To this end, common requirements are low latency, high accuracy and a small memory footprint. A good trade-off between latency and accuracy has been shown with networks comprising multiple outputs at different depths. In this study, we further optimize such networks for efficient inference by introducing a homogeneous network structure to compute these outputs. A single building block is iteratively executed such that its output is refined over the iterations. On the CamVid and Cityscapes datasets for semantic segmentation, we show and compare empirical results for two scenarios: in the first scenario, the parameters are allowed to be updated at every iteration, whereas in the second scenario, parameters are kept constant and are shared between iterations. We compensate for both the homogeneity of the network architecture and the weight sharing by increasing the number of multiply-accumulate operations by a factor of 3, respectively. However, the potentially more efficient implementation of the introduced networks on novel, massively-parallel hardware accelerators may outweigh the increased number of operations.

PDF Abstract