HyT-NAS: Hybrid Transformers Neural Architecture Search for Edge Devices

8 Mar 2023  ·  Lotfi Abdelkrim Mecharbat, Hadjer Benmeziane, Hamza Ouarnoughi, Smail Niar ·

Vision Transformers have enabled recent attention-based Deep Learning (DL) architectures to achieve remarkable results in Computer Vision (CV) tasks. However, due to the extensive computational resources required, these architectures are rarely implemented on resource-constrained platforms. Current research investigates hybrid handcrafted convolution-based and attention-based models for CV tasks such as image classification and object detection. In this paper, we propose HyT-NAS, an efficient Hardware-aware Neural Architecture Search (HW-NAS) including hybrid architectures targeting vision tasks on tiny devices. HyT-NAS improves state-of-the-art HW-NAS by enriching the search space and enhancing the search strategy as well as the performance predictors. Our experiments show that HyT-NAS achieves a similar hypervolume with less than ~5x training evaluations. Our resulting architecture outperforms MLPerf MobileNetV1 by 6.3% accuracy improvement with 3.5x less number of parameters on Visual Wake Words.

PDF Abstract

Datasets


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Image Classification Visual Wake Words ProxylessNAS Accuracy 86.55 # 2
Image Classification Visual Wake Words MobileNetV2 (x0.35) Accuracy 86.34 # 3
Image Classification Visual Wake Words MobileNetV1 Accuracy 83.7 # 4
Image Classification Visual Wake Words HyT-NAS-BA Accuracy 92.25 # 1

Methods