MoPro: Webly Supervised Learning with Momentum Prototypes

ICLR 2021  ·  Junnan Li, Caiming Xiong, Steven C. H. Hoi ·

We propose a webly-supervised representation learning method that does not suffer from the annotation unscalability of supervised learning, nor the computation unscalability of self-supervised learning. Most existing works on webly-supervised representation learning adopt a vanilla supervised learning method without accounting for the prevalent noise in the training data, whereas most prior methods in learning with label noise are less effective for real-world large-scale noisy data. We propose momentum prototypes (MoPro), a simple contrastive learning method that achieves online label noise correction, out-of-distribution sample removal, and representation learning. MoPro achieves state-of-the-art performance on WebVision, a weakly-labeled noisy dataset. MoPro also shows superior performance when the pretrained model is transferred to down-stream image classification and detection tasks. It outperforms the ImageNet supervised pretrained model by +10.5 on 1-shot classification on VOC, and outperforms the best self-supervised pretrained model by +17.3 when finetuned on 1\% of ImageNet labeled samples. Furthermore, MoPro is more robust to distribution shifts. Code and pretrained models are available at

PDF Abstract ICLR 2021 PDF ICLR 2021 Abstract
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Image Classification OmniBenchmark MoPro-V2 Average Top-1 Accuracy 36.1 # 13
Image Classification WebVision-1000 MoPro (ResNet-50) Top-1 Accuracy 73.9% # 13
Top-5 Accuracy 90.0% # 10
ImageNet Top-1 Accuracy 67.8% # 4
ImageNet Top-5 Accuracy 87.0% # 4