Deep Compressive Offloading: Speeding Up Neural Network Inference by Trading Edge Computation for Network Latency

The 18th Conference on Embedded Networked Sensor Systems 2020 · Shuochao Yao, Jinyang Li, Dongxin Liu, Tianshi Wang, Shengzhong Liu, Huajie Shao, Tarek Abdelzaher ·

With recent advances, neural networks have become a crucial building block in intelligent IoT systems and sensing applications. However, the excessive computational demand remains a serious impediment to their deployments on low-end IoT devices. With the emergence of edge computing, offloading grows into a promising technique to circumvent end-device limitations. However, transferring data between local and edge devices takes up a large proportion of time in existing offloading frameworks, creating a bottleneck for low-latency intelligent services. In this work, we propose a general framework, called deep compressive offloading. By integrating compressive sensing theory and deep learning, our framework can encode data for offloading into tiny sizes with negligible overhead on local devices and decode the data on the edge server, while offering theoretical guarantees on perfect reconstruction and lossless inference. By trading edge computing resources for data transmission time, our design can significantly reduce offloading latency with almost no accuracy loss. We build a deep compressive offloading system to serve state-of-the-art computer vision and speech recognition services. With comprehensive evaluations, our system can consistently reduce end-to-end latency by 2× to 4× with 1% accuracy loss, compared to state-of-the-art neural network offloading systems. In conditions of limited network bandwidth or intensive background traffic, our system can further speed up the neural network inference by up to 35×1.

PDF Abstract