ID	tf_efficientnet_b0_ns
LR	0.128
Epochs	700
Dropout	0.5
Crop Pct	0.875
Momentum	0.9
Batch Size	2048
Image Size	224
Weight Decay	0.00001
Interpolation	bicubic
RMSProp Decay	0.9
Label Smoothing	0.1
BatchNorm Momentum	0.99
Stochastic Depth Survival	0.8

ID	tf_efficientnet_b1_ns
LR	0.128
Epochs	700
Dropout	0.5
Crop Pct	0.882
Momentum	0.9
Batch Size	2048
Image Size	240
Weight Decay	0.00001
Interpolation	bicubic
RMSProp Decay	0.9
Label Smoothing	0.1
BatchNorm Momentum	0.99
Stochastic Depth Survival	0.8

ID	tf_efficientnet_b2_ns
LR	0.128
Epochs	700
Dropout	0.5
Crop Pct	0.89
Momentum	0.9
Batch Size	2048
Image Size	260
Weight Decay	0.00001
Interpolation	bicubic
RMSProp Decay	0.9
Label Smoothing	0.1
BatchNorm Momentum	0.99
Stochastic Depth Survival	0.8

ID	tf_efficientnet_b3_ns
LR	0.128
Epochs	700
Dropout	0.5
Crop Pct	0.904
Momentum	0.9
Batch Size	2048
Image Size	300
Weight Decay	0.00001
Interpolation	bicubic
RMSProp Decay	0.9
Label Smoothing	0.1
BatchNorm Momentum	0.99
Stochastic Depth Survival	0.8

ID	tf_efficientnet_b4_ns
LR	0.128
Epochs	700
Dropout	0.5
Crop Pct	0.922
Momentum	0.9
Batch Size	2048
Image Size	380
Weight Decay	0.00001
Interpolation	bicubic
RMSProp Decay	0.9
Label Smoothing	0.1
BatchNorm Momentum	0.99
Stochastic Depth Survival	0.8

ID	tf_efficientnet_b5_ns
LR	0.128
Epochs	350
Dropout	0.5
Crop Pct	0.934
Momentum	0.9
Batch Size	2048
Image Size	456
Weight Decay	0.00001
Interpolation	bicubic
RMSProp Decay	0.9
Label Smoothing	0.1
BatchNorm Momentum	0.99
Stochastic Depth Survival	0.8

ID	tf_efficientnet_b6_ns
LR	0.128
Epochs	350
Dropout	0.5
Crop Pct	0.942
Momentum	0.9
Batch Size	2048
Image Size	528
Weight Decay	0.00001
Interpolation	bicubic
RMSProp Decay	0.9
Label Smoothing	0.1
BatchNorm Momentum	0.99
Stochastic Depth Survival	0.8

ID	tf_efficientnet_b7_ns
LR	0.128
Epochs	350
Dropout	0.5
Crop Pct	0.949
Momentum	0.9
Batch Size	2048
Image Size	600
Weight Decay	0.00001
Interpolation	bicubic
RMSProp Decay	0.9
Label Smoothing	0.1
BatchNorm Momentum	0.99
Stochastic Depth Survival	0.8

ID	tf_efficientnet_l2_ns
LR	0.128
Epochs	350
Dropout	0.5
Crop Pct	0.96
Momentum	0.9
Batch Size	2048
Image Size	800
Weight Decay	0.00001
Interpolation	bicubic
RMSProp Decay	0.9
Label Smoothing	0.1
BatchNorm Momentum	0.99
Stochastic Depth Survival	0.8

Noisy Student

rwightman / pytorch-image-models

Last updated on Feb 14, 2021

Parameters 5 Million

FLOPs 489 Million

File Size 20.40 MB

Training Data JFT-300M, ImageNet

Training Resources Cloud TPU v3 Pod

Training Time

Training Techniques	Noisy Student, FixRes, RMSProp, Weight Decay, Label Smoothing, AutoAugment, RandAugment
Architecture	1x1 Convolution, Average Pooling, Convolution, Dense Connections, Dropout, Inverted Residual Block, Batch Normalization, Squeeze-and-Excitation Block, Swish
ID	tf_efficientnet_b0_ns
LR	0.128
Epochs	700
Dropout	0.5
Crop Pct	0.875
Momentum	0.9
Batch Size	2048
Image Size	224
Weight Decay	0.00001
Interpolation	bicubic
RMSProp Decay	0.9
Label Smoothing	0.1
BatchNorm Momentum	0.99
Stochastic Depth Survival	0.8
SHOW MORE
SHOW LESS

Parameters 8 Million

FLOPs 884 Million

File Size 30.06 MB

Training Data JFT-300M, ImageNet

Training Resources Cloud TPU v3 Pod

Training Time

Training Techniques	Noisy Student, FixRes, RMSProp, Weight Decay, Label Smoothing, AutoAugment, RandAugment
Architecture	1x1 Convolution, Average Pooling, Convolution, Dense Connections, Dropout, Inverted Residual Block, Batch Normalization, Squeeze-and-Excitation Block, Swish
ID	tf_efficientnet_b1_ns
LR	0.128
Epochs	700
Dropout	0.5
Crop Pct	0.882
Momentum	0.9
Batch Size	2048
Image Size	240
Weight Decay	0.00001
Interpolation	bicubic
RMSProp Decay	0.9
Label Smoothing	0.1
BatchNorm Momentum	0.99
Stochastic Depth Survival	0.8
SHOW MORE
SHOW LESS

Parameters 9 Million

FLOPs 1 Billion

File Size 35.10 MB

Training Data JFT-300M, ImageNet

Training Resources Cloud TPU v3 Pod

Training Time

Training Techniques	Noisy Student, FixRes, RMSProp, Weight Decay, Label Smoothing, AutoAugment, RandAugment
Architecture	1x1 Convolution, Average Pooling, Convolution, Dense Connections, Dropout, Inverted Residual Block, Batch Normalization, Squeeze-and-Excitation Block, Swish
ID	tf_efficientnet_b2_ns
LR	0.128
Epochs	700
Dropout	0.5
Crop Pct	0.89
Momentum	0.9
Batch Size	2048
Image Size	260
Weight Decay	0.00001
Interpolation	bicubic
RMSProp Decay	0.9
Label Smoothing	0.1
BatchNorm Momentum	0.99
Stochastic Depth Survival	0.8
SHOW MORE
SHOW LESS

Parameters 12 Million

FLOPs 2 Billion

File Size 47.10 MB

Training Data JFT-300M, ImageNet

Training Resources Cloud TPU v3 Pod

Training Time

Training Techniques	Noisy Student, FixRes, RMSProp, Weight Decay, Label Smoothing, AutoAugment, RandAugment
Architecture	1x1 Convolution, Average Pooling, Convolution, Dense Connections, Dropout, Inverted Residual Block, Batch Normalization, Squeeze-and-Excitation Block, Swish
ID	tf_efficientnet_b3_ns
LR	0.128
Epochs	700
Dropout	0.5
Crop Pct	0.904
Momentum	0.9
Batch Size	2048
Image Size	300
Weight Decay	0.00001
Interpolation	bicubic
RMSProp Decay	0.9
Label Smoothing	0.1
BatchNorm Momentum	0.99
Stochastic Depth Survival	0.8
SHOW MORE
SHOW LESS

Parameters 19 Million

FLOPs 6 Billion

File Size 74.38 MB

Training Data JFT-300M, ImageNet

Training Resources Cloud TPU v3 Pod

Training Time

Training Techniques	Noisy Student, FixRes, RMSProp, Weight Decay, Label Smoothing, AutoAugment, RandAugment
Architecture	1x1 Convolution, Average Pooling, Convolution, Dense Connections, Dropout, Inverted Residual Block, Batch Normalization, Squeeze-and-Excitation Block, Swish
ID	tf_efficientnet_b4_ns
LR	0.128
Epochs	700
Dropout	0.5
Crop Pct	0.922
Momentum	0.9
Batch Size	2048
Image Size	380
Weight Decay	0.00001
Interpolation	bicubic
RMSProp Decay	0.9
Label Smoothing	0.1
BatchNorm Momentum	0.99
Stochastic Depth Survival	0.8
SHOW MORE
SHOW LESS

Parameters 30 Million

FLOPs 13 Billion

File Size 116.73 MB

Training Data JFT-300M, ImageNet

Training Resources Cloud TPU v3 Pod

Training Time

Training Techniques	Noisy Student, FixRes, RMSProp, Weight Decay, Label Smoothing, AutoAugment, RandAugment
Architecture	1x1 Convolution, Average Pooling, Convolution, Dense Connections, Dropout, Inverted Residual Block, Batch Normalization, Squeeze-and-Excitation Block, Swish
ID	tf_efficientnet_b5_ns
LR	0.128
Epochs	350
Dropout	0.5
Crop Pct	0.934
Momentum	0.9
Batch Size	2048
Image Size	456
Weight Decay	0.00001
Interpolation	bicubic
RMSProp Decay	0.9
Label Smoothing	0.1
BatchNorm Momentum	0.99
Stochastic Depth Survival	0.8
SHOW MORE
SHOW LESS

Parameters 43 Million

FLOPs 24 Billion

File Size 165.21 MB

Training Data JFT-300M, ImageNet

Training Resources Cloud TPU v3 Pod

Training Time

Training Techniques	Noisy Student, FixRes, RMSProp, Weight Decay, Label Smoothing, AutoAugment, RandAugment
Architecture	1x1 Convolution, Average Pooling, Convolution, Dense Connections, Dropout, Inverted Residual Block, Batch Normalization, Squeeze-and-Excitation Block, Swish
ID	tf_efficientnet_b6_ns
LR	0.128
Epochs	350
Dropout	0.5
Crop Pct	0.942
Momentum	0.9
Batch Size	2048
Image Size	528
Weight Decay	0.00001
Interpolation	bicubic
RMSProp Decay	0.9
Label Smoothing	0.1
BatchNorm Momentum	0.99
Stochastic Depth Survival	0.8
SHOW MORE
SHOW LESS

Parameters 66 Million

FLOPs 48 Billion

File Size 254.49 MB

Training Data JFT-300M, ImageNet

Training Resources Cloud TPU v3 Pod

Training Time

Training Techniques	Noisy Student, FixRes, RMSProp, Weight Decay, Label Smoothing, AutoAugment, RandAugment
Architecture	1x1 Convolution, Average Pooling, Convolution, Dense Connections, Dropout, Inverted Residual Block, Batch Normalization, Squeeze-and-Excitation Block, Swish
ID	tf_efficientnet_b7_ns
LR	0.128
Epochs	350
Dropout	0.5
Crop Pct	0.949
Momentum	0.9
Batch Size	2048
Image Size	600
Weight Decay	0.00001
Interpolation	bicubic
RMSProp Decay	0.9
Label Smoothing	0.1
BatchNorm Momentum	0.99
Stochastic Depth Survival	0.8
SHOW MORE
SHOW LESS

Parameters 480 Million

FLOPs 612 Billion

File Size 1.84 GB

Training Data JFT-300M, ImageNet

Training Resources Cloud TPU v3 Pod

Training Time 6 days

Training Techniques	Noisy Student, FixRes, RMSProp, Weight Decay, Label Smoothing, AutoAugment, RandAugment
Architecture	1x1 Convolution, Average Pooling, Convolution, Dense Connections, Dropout, Inverted Residual Block, Batch Normalization, Squeeze-and-Excitation Block, Swish
ID	tf_efficientnet_l2_ns
LR	0.128
Epochs	350
Dropout	0.5
Crop Pct	0.96
Momentum	0.9
Batch Size	2048
Image Size	800
Weight Decay	0.00001
Interpolation	bicubic
RMSProp Decay	0.9
Label Smoothing	0.1
BatchNorm Momentum	0.99
Stochastic Depth Survival	0.8
SHOW MORE
SHOW LESS

README.md

Summary

Noisy Student Training is a semi-supervised learning approach. It extends the idea of self-training and distillation with the use of equal-or-larger student models and noise added to the student during learning. It has three main steps:

train a teacher model on labeled images
use the teacher to generate pseudo labels on unlabeled images
train a student model on the combination of labeled images and pseudo labeled images.

The algorithm is iterated a few times by treating the student as a teacher to relabel the unlabeled data and training a new student.

Noisy Student Training seeks to improve on self-training and distillation in two ways. First, it makes the student larger than, or at least equal to, the teacher so the student can better learn from a larger dataset. Second, it adds noise to the student so the noised student is forced to learn harder from the pseudo labels. To noise the student, it uses input noise such as RandAugment data augmentation, and model noise such as dropout and stochastic depth during training.

How do I load this model?

To load a pretrained model:

import timm
m = timm.create_model('tf_efficientnet_b0_ns', pretrained=True)
m.eval()

Replace the model name with the variant you want to use, e.g. tf_efficientnet_b0_ns. You can find the IDs in the model summaries at the top of this page.

How do I train this model?

You can follow the timm recipe scripts for training a new model afresh.

Citation

@misc{xie2020selftraining,
      title={Self-training with Noisy Student improves ImageNet classification}, 
      author={Qizhe Xie and Minh-Thang Luong and Eduard Hovy and Quoc V. Le},
      year={2020},
      eprint={1911.04252},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Results

Image Classification on ImageNet

MODEL	TOP 1 ACCURACY	TOP 5 ACCURACY
tf_efficientnet_l2_ns	88.35%	98.66%
tf_efficientnet_b7_ns	86.83%	98.08%
tf_efficientnet_b6_ns	86.45%	97.88%
tf_efficientnet_b5_ns	86.08%	97.75%
tf_efficientnet_b4_ns	85.15%	97.47%
tf_efficientnet_b3_ns	84.04%	96.91%
tf_efficientnet_b2_ns	82.39%	96.24%
tf_efficientnet_b1_ns	81.39%	95.74%
tf_efficientnet_b0_ns	78.66%	94.37%