Debiased Learning from Naturally Imbalanced Pseudo-Labels

CVPR 2022  ยท  Xudong Wang, Zhirong Wu, Long Lian, Stella X. Yu ยท

Pseudo-labels are confident predictions made on unlabeled target data by a classifier trained on labeled source data. They are widely used for adapting a model to unlabeled data, e.g., in a semi-supervised learning setting. Our key insight is that pseudo-labels are naturally imbalanced due to intrinsic data similarity, even when a model is trained on balanced source data and evaluated on balanced target data. If we address this previously unknown imbalanced classification problem arising from pseudo-labels instead of ground-truth training labels, we could remove model biases towards false majorities created by pseudo-labels. We propose a novel and effective debiased learning method with pseudo-labels, based on counterfactual reasoning and adaptive margins: The former removes the classifier response bias, whereas the latter adjusts the margin of each class according to the imbalance of pseudo-labels. Validated by extensive experimentation, our simple debiased learning delivers significant accuracy gains over the state-of-the-art on ImageNet-1K: 26% for semi-supervised learning with 0.2% annotations and 9% for zero-shot learning. Our code is available at: https://github.com/frank-xwang/debiased-pseudo-labeling.

PDF Abstract CVPR 2022 PDF CVPR 2022 Abstract

Results from the Paper


 Ranked #1 on Few-Shot Image Classification on ImageNet - 0-Shot (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Semi-Supervised Image Classification CIFAR-10, 250 Labels DebiasPL (w/ FixMatch) Percentage error 4.6 # 2
Semi-Supervised Image Classification CIFAR-10, 40 Labels DebiasPL (w/ FixMatch) Percentage error 5.4 # 6
Semi-Supervised Image Classification ImageNet - 0.2% labeled data DebiasPL (ResNet-50) ImageNet Top-1 Accuracy 69.6% # 1
Few-Shot Image Classification ImageNet - 0-Shot DebiasPL (ResNet50) Accuracy 68.3% # 1
Semi-Supervised Image Classification ImageNet - 1% labeled data DebiasPL (ResNet-50) Top 1 Accuracy 71.3% # 11

Methods