no code implementations • 18 May 2023 • Mengyang Yuan, Bo Lang, Fengnan Quan
The learning simplifier utilizes the attention mechanism to further simplify the knowledge of the teacher model and is jointly trained with the student model using the distillation loss, which means that the process of simplification is correlated with the training objective of the student model and ensures that the simplified new teacher knowledge representation is more suitable for the specific student model.