Search Results for author: JinHwa Kim

Found 2 papers, 0 papers with code

Why Knowledge Distillation Amplifies Gender Bias and How to Mitigate from the Perspective of DistilBERT

no code implementations NAACL (GeBNLP) 2022 Jaimeen Ahn, Hwaran Lee, JinHwa Kim, Alice Oh

Knowledge distillation is widely used to transfer the language understanding of a large model to a smaller model. However, after knowledge distillation, it was found that the smaller model is more biased by gender compared to the source large model. This paper studies what causes gender bias to increase after the knowledge distillation process. Moreover, we suggest applying a variant of the mixup on knowledge distillation, which is used to increase generalizability during the distillation process, not for augmentation. By doing so, we can significantly reduce the gender bias amplification after knowledge distillation. We also conduct an experiment on the GLUE benchmark to demonstrate that even if the mixup is applied, it does not have a significant adverse effect on the model’s performance.

Knowledge Distillation

Robust Safety Classifier for Large Language Models: Adversarial Prompt Shield

no code implementations31 Oct 2023 JinHwa Kim, Ali Derakhshan, Ian G. Harris

Large Language Models' safety remains a critical concern due to their vulnerability to adversarial attacks, which can prompt these systems to produce harmful responses.

Cannot find the paper you are looking for? You can Submit a new open access paper.