Why Knowledge Distillation Amplifies Gender Bias and How to Mitigate from the Perspective of DistilBERT

Knowledge distillation is widely used to transfer the language understanding of a large model to a smaller model.However, after knowledge distillation, it was found that the smaller model is more biased by gender compared to the source large model.This paper studies what causes gender bias to increase after the knowledge distillation process.Moreover, we suggest applying a variant of the mixup on knowledge distillation, which is used to increase generalizability during the distillation process, not for augmentation.By doing so, we can significantly reduce the gender bias amplification after knowledge distillation.We also conduct an experiment on the GLUE benchmark to demonstrate that even if the mixup is applied, it does not have a significant adverse effect on the model’s performance.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here