Knowledge Distillation
1671 papers with code • 6 benchmarks • 5 datasets
Knowledge distillation is the process of transferring knowledge from a large model to a smaller one. While large models (such as very deep neural networks or ensembles of many models) have higher knowledge capacity than small models, this capacity might not be fully utilized.
Libraries
Use these libraries to find Knowledge Distillation models and implementationsMost implemented papers
Focal Loss for Dense Object Detection
Our novel Focal Loss focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training.
Distilling the Knowledge in a Neural Network
A very simple way to improve the performance of almost any machine learning algorithm is to train many different models on the same data and then to average their predictions.
Well-Read Students Learn Better: On the Importance of Pre-training Compact Models
Recent developments in natural language representations have been accompanied by large and expensive models that leverage vast amounts of general-domain text through self-supervised pre-training.
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large models in on-the-edge and/or under constrained computational training or inference budgets remains challenging.
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
In this paper, we propose FastSpeech 2, which addresses the issues in FastSpeech and better solves the one-to-many mapping problem in TTS by 1) directly training the model with ground-truth target instead of the simplified output from teacher, and 2) introducing more variation information of speech (e. g., pitch, energy and more accurate duration) as conditional inputs.
Grad-CAM++: Improved Visual Explanations for Deep Convolutional Networks
Over the last decade, Convolutional Neural Network (CNN) models have been highly successful in solving complex vision problems.
Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation
The training is based on the idea that a translated sentence should be mapped to the same location in the vector space as the original sentence.
TinyBERT: Distilling BERT for Natural Language Understanding
To accelerate inference and reduce model size while maintaining accuracy, we first propose a novel Transformer distillation method that is specially designed for knowledge distillation (KD) of the Transformer-based models.
AlphaPose: Whole-Body Regional Multi-Person Pose Estimation and Tracking in Real-Time
Accurate whole-body multi-person pose estimation and tracking is an important yet challenging topic in computer vision.
Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer
Attention plays a critical role in human visual experience.