no code implementations • 22 Nov 2023 • Nathan Brown, Ashton Williamson, Tahj Anderson, Logan Lawrence
In this work, we provide an evaluation of model compression via knowledge distillation on efficient attention transformers.
Knowledge Distillation Model Compression +4