Large scale distributed neural network training through online distillation

Techniques such as ensembling and distillation promise model quality improvements when paired with almost any base model. However, due to increased test-time cost (for ensembles) and increased complexity of the training pipeline (for distillation), these techniques are challenging to use in industrial settings... (read more)

Results in Papers With Code
(↓ scroll down to see all results)