TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Knowledge Distillation	CIFAR-100	resnet110 (T:resnet110 S:resnet20)	Top-1 Accuracy (%)	71.88	# 22
Knowledge Distillation	CIFAR-100	vgg8 (T:vgg13 S:vgg8)	Top-1 Accuracy (%)	74.72	# 16
Knowledge Distillation	CIFAR-100	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.15	# 10

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/wasserstein-contrastive-representation/knowledge-distillation-on-cifar-100)](https://paperswithcode.com/sota/knowledge-distillation-on-cifar-100?p=wasserstein-contrastive-representation)`

Wasserstein Contrastive Representation Distillation

CVPR 2021 · Liqun Chen, Dong Wang, Zhe Gan, Jingjing Liu, Ricardo Henao, Lawrence Carin ·

The primary goal of knowledge distillation (KD) is to encapsulate the information of a model learned from a teacher network into a student network, with the latter being more compact than the former. Existing work, e.g., using Kullback-Leibler divergence for distillation, may fail to capture important structural knowledge in the teacher network and often lacks the ability for feature generalization, particularly in situations when teacher and student are built to address different classification tasks. We propose Wasserstein Contrastive Representation Distillation (WCoRD), which leverages both primal and dual forms of Wasserstein distance for KD. The dual form is used for global knowledge transfer, yielding a contrastive learning objective that maximizes the lower bound of mutual information between the teacher and the student networks. The primal form is used for local contrastive knowledge transfer within a mini-batch, effectively matching the distributions of features between the teacher and the student networks. Experiments demonstrate that the proposed WCoRD method outperforms state-of-the-art approaches on privileged information distillation, model compression and cross-modal transfer.

PDF Abstract CVPR 2021 PDF CVPR 2021 Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Contrastive Learning

Knowledge Distillation

Model Compression

Transfer Learning

Datasets

ImageNet

CIFAR-100

STL-10

Results from the Paper

Edit

Ranked #10 on Knowledge Distillation on CIFAR-100

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Knowledge Distillation	CIFAR-100	resnet110 (T:resnet110 S:resnet20)	Top-1 Accuracy (%)	71.88	# 22	Compare
Knowledge Distillation	CIFAR-100	vgg8 (T:vgg13 S:vgg8)	Top-1 Accuracy (%)	74.72	# 16	Compare
Knowledge Distillation	CIFAR-100	resnet8x4 (T: resnet32x4 S: resnet8x4)	Top-1 Accuracy (%)	76.15	# 10	Compare

Methods

Add Remove

Contrastive Learning • Knowledge Distillation

Edit Social Preview

Wasserstein Contrastive Representation Distillation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove