Discovering Multi-Hardware Mobile Models via Architecture Search

Hardware-aware neural architecture designs have been predominantly focusing on optimizing model performance on single hardware and model development complexity, where another important factor, model deployment complexity, has been largely ignored. In this paper, we argue that, for applications that may be deployed on multiple hardware, having different single-hardware models across the deployed hardware makes it hard to guarantee consistent outputs across hardware and duplicates engineering work for debugging and fixing. To minimize such deployment cost, we propose an alternative solution, multi-hardware models, where a single architecture is developed for multiple hardware. With thoughtful search space design and incorporating the proposed multi-hardware metrics in neural architecture search, we discover multi-hardware models that give state-of-the-art (SoTA) performance across multiple hardware in both average and worse case scenarios. For performance on individual hardware, the single multi-hardware model yields similar or better results than SoTA performance on accelerators like GPU, DSP and EdgeTPU which was achieved by different models, while having similar performance with MobilenetV3 Large Minimalistic model on mobile CPU.

Results in Papers With Code
(↓ scroll down to see all results)