no code implementations • 21 Feb 2023 • Seungwoo Son, Namhoon Lee, Jaeho Lee
We present MaskedKD, a simple yet effective strategy that can significantly reduce the cost of distilling ViTs without sacrificing the prediction accuracy of the student model.