WrapNet: Neural Net Inference with Ultra-Low-Precision Arithmetic

Low-precision neural networks represent both weights and activations with few bits, drastically reducing the multiplication complexity. Nonetheless, these products are accumulated using high-precision (typically 32-bit) additions, an operation that dominates the arithmetic complexity of inference when using extreme quantization (e.g., binary weights). To further optimize inference, we propose WrapNet that adapts neural networks to use low-precision (8-bit) additions in the accumulators, achieving classification accuracy comparable to their 32-bit counterparts. We achieve resilience to low-precision accumulation by inserting a cyclic activation layer, as well as an overflow penalty regularizer. We demonstrate the efficacy of our approach on both software and hardware platforms.

PDF Abstract
No code implementations yet. Submit your code now


  Add Datasets introduced or used in this paper

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.


No methods listed for this paper. Add relevant methods here