Co-training $2^L$ Submodels for Visual Recognition

9 Dec 2022  ·  Hugo Touvron, Matthieu Cord, Maxime Oquab, Piotr Bojanowski, Jakob Verbeek, Hervé Jégou ·

We introduce submodel co-training, a regularization method related to co-training, self-distillation and stochastic depth. Given a neural network to be trained, for each sample we implicitly instantiate two altered networks, ``submodels'', with stochastic depth: we activate only a subset of the layers. Each network serves as a soft teacher to the other, by providing a loss that complements the regular loss provided by the one-hot label. Our approach, dubbed cosub, uses a single set of weights, and does not involve a pre-trained external model or temporal averaging. Experimentally, we show that submodel co-training is effective to train backbones for recognition tasks such as image classification and semantic segmentation. Our approach is compatible with multiple architectures, including RegNet, ViT, PiT, XCiT, Swin and ConvNext. Our training strategy improves their results in comparable settings. For instance, a ViT-B pretrained with cosub on ImageNet-21k obtains 87.4% top-1 acc. @448 on ImageNet-val.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Image Classification ImageNet ViT-M@224 (cosub) Top 1 Accuracy 85.0% # 255
Image Classification ImageNet Swin-B@224 (cosub) Top 1 Accuracy 86.2% # 164
Image Classification ImageNet ViT-B@224 (cosub) Top 1 Accuracy 86.3% # 153
Image Classification ImageNet ViT-L@224 (cosub) Top 1 Accuracy 87.5% # 86
Image Classification ImageNet ViT-H@224 (cosub) Top 1 Accuracy 88.0% # 69
Image Classification ImageNet ViT-S@224 (cosub) Top 1 Accuracy 83.1% # 426
Image Classification ImageNet RegnetY16GF@224 (cosub) Top 1 Accuracy 84.2% # 313
Image Classification ImageNet PiT-B@224 (cosub) Top 1 Accuracy 85.8% # 187
Image Classification ImageNet ConvNeXt-B@224 (cosub) Top 1 Accuracy 85.8% # 187
Image Classification ImageNet Swin-L@224 (cosub) Top 1 Accuracy 87.1% # 103

Methods


No methods listed for this paper. Add relevant methods here