Circumventing Outliers of AutoAugment with Knowledge Distillation

AutoAugment has been a powerful algorithm that improves the accuracy of many vision tasks, yet it is sensitive to the operator space as well as hyper-parameters, and an improper setting may degenerate network optimization. This paper delves deep into the working mechanism, and reveals that AutoAugment may remove part of discriminative information from the training image and so insisting on the ground-truth label is no longer the best option. To relieve the inaccuracy of supervision, we make use of knowledge distillation that refers to the output of a teacher model to guide network training. Experiments are performed in standard image classification benchmarks, and demonstrate the effectiveness of our approach in suppressing noise of data augmentation and stabilizing training. Upon the cooperation of knowledge distillation and AutoAugment, we claim the new state-of-the-art on ImageNet classification with a top-1 accuracy of 85.8%.

PDF Abstract ECCV 2020 PDF ECCV 2020 Abstract

Results from the Paper

Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Image Classification ImageNet KDforAA (EfficientNet-B8) Top 1 Accuracy 85.8% # 91
Number of params 88M # 85
Hardware Burden None # 1
Operations per network pass None # 1
Image Classification ImageNet KDforAA (EfficientNet-B7) Top 1 Accuracy 85.5% # 103
Number of params 66M # 126