no code implementations • 19 Sep 2024 • Jun Rao, Xuebo Liu, Zepeng Lin, Liang Ding, Jing Li, DaCheng Tao, Min Zhang
Knowledge distillation (KD) is a technique that compresses large teacher models by training smaller student models to mimic them.
Knowledge Distillation