Knowledge Distillation

1276 papers with code • 5 benchmarks • 4 datasets

Knowledge distillation is the process of transferring knowledge from a large model to a smaller one. While large models (such as very deep neural networks or ensembles of many models) have higher knowledge capacity than small models, this capacity might not be fully utilized.

Benchmarks

Add a Result

These leaderboards are used to track progress in Knowledge Distillation

Dataset	Best Model	Compare
ImageNet	KD++(T: regnety-16GF S:ViT-B)	See all
CIFAR-100	shufflenet-v2(T:resnet-32x4, S:shufflenet-v2)	See all
MS COCO	ADLIK-Faster (T: Faster R-CNN vit-base S: Faster R-CNN deit-small)	See all
COCO 2017 val	ReviewKD++(T: faster rcnn(resnet101), S:faster rcnn(resnet50))	See all
PASCAL VOC	LSHFM (T: ResNet101 S: ResNet50)	See all

Libraries

Use these libraries to find Knowledge Distillation models and implementations

yoshitomo-matsubara/torchdistill

22 papers

1,272

faceonlive/ai-research

4 papers

156

Datasets

Subtasks

Most implemented papers

Most implemented Social Latest No code

Focal Loss for Dense Object Detection

facebookresearch/detectron • • ICCV 2017

Our novel Focal Loss focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training.

231

Paper
Code

Distilling the Knowledge in a Neural Network

labmlai/annotated_deep_learning_paper_implementations • • 9 Mar 2015

A very simple way to improve the performance of almost any machine learning algorithm is to train many different models on the same data and then to average their predictions.

Paper
Code

Well-Read Students Learn Better: On the Importance of Pre-training Compact Models

google-research/bert • • ICLR 2020

Recent developments in natural language representations have been accompanied by large and expensive models that leverage vast amounts of general-domain text through self-supervised pre-training.

Paper
Code

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

huggingface/transformers • • NeurIPS 2019

As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large models in on-the-edge and/or under constrained computational training or inference budgets remains challenging.

Paper
Code

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

coqui-ai/TTS • • ICLR 2021

In this paper, we propose FastSpeech 2, which addresses the issues in FastSpeech and better solves the one-to-many mapping problem in TTS by 1) directly training the model with ground-truth target instead of the simplified output from teacher, and 2) introducing more variation information of speech (e. g., pitch, energy and more accurate duration) as conditional inputs.

Paper
Code

Grad-CAM++: Improved Visual Explanations for Deep Convolutional Networks

adityac94/Grad_CAM_plus_plus • • 30 Oct 2017

Over the last decade, Convolutional Neural Network (CNN) models have been highly successful in solving complex vision problems.

Paper
Code

Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation

UKPLab/sentence-transformers • • EMNLP 2020

The training is based on the idea that a translated sentence should be mapped to the same location in the vector space as the original sentence.

Paper
Code

TinyBERT: Distilling BERT for Natural Language Understanding

huawei-noah/Pretrained-Language-Model • • Findings of the Association for Computational Linguistics 2020

To accelerate inference and reduce model size while maintaining accuracy, we first propose a novel Transformer distillation method that is specially designed for knowledge distillation (KD) of the Transformer-based models.

Paper
Code

Distilling Knowledge via Knowledge Review

Jia-Research-Lab/ReviewKD • • CVPR 2021

Knowledge distillation transfers knowledge from the teacher network to the student one, with the goal of greatly improving the performance of the student network.

Paper
Code

AlphaPose: Whole-Body Regional Multi-Person Pose Estimation and Tracking in Real-Time

MVIG-SJTU/AlphaPose • • 7 Nov 2022

Accurate whole-body multi-person pose estimation and tracking is an important yet challenging topic in computer vision.

Paper
Code

Knowledge Distillation

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result