Search Results for author: Minchong Li

Found 1 papers, 1 papers with code

BiLD: Bi-directional Logits Difference Loss for Large Language Model Distillation

1 code implementation19 Jun 2024 Minchong Li, Feng Zhou, Xiaohui Song

The BiLD loss filters out the long-tail noise by utilizing only top-$k$ teacher and student logits, and leverages the internal logits ranking information by constructing logits differences.

Knowledge Distillation Language Modelling +1

Cannot find the paper you are looking for? You can Submit a new open access paper.