no code implementations • WS 2019 • Luke Melas-Kyriazi, George Han, Celine Liang
Recent research points to knowledge distillation as a potential solution, showing that when training data for a given task is abundant, it is possible to distill a large (teacher) LM into a small task-specific (student) network with minimal loss of performance.