Towards Transparent Neural Network Acceleration

19 Oct 2018 · Nicolas Weber, Mathias Niepert, Felipe Huici ·

Deep learning has found numerous applications thanks to its versatility and accuracy on pattern recognition problems such as visual object detection. Learning and inference in deep neural networks, however, are memory and compute intensive and so improving efficiency is one of the major challenges for frameworks such as PyTorch, Tensorflow, and Caffe. While the efficiency problem can be partially addressed with specialized hardware and its corresponding proprietary libraries, we believe that neural network acceleration should be transparent to the user and should support all hardware platforms and deep learning libraries. To this end, we introduce a transparent middleware layer for neural network acceleration. The system is built around a compiler for deep learning, allowing one to combine device-specific libraries and custom optimizations while supporting numerous hardware devices. In contrast to other projects, we explicitly target the optimization of both prediction and training of neural networks. We present the current development status and some preliminary but encouraging results: on a standard x86 server, using CPUs our system achieves a 11.8x speed-up for inference and a 8.0x for batched-prediction (128); on GPUs we achieve a 1.7x and 2.3x speed-up respectively.

PDF Abstract