Convolutional neural networks provide visual features that perform remarkably
well in many computer vision applications. However, training these networks
requires significant amounts of supervision...
This paper introduces a generic
framework to train deep networks, end-to-end, with no supervision. We propose
to fix a set of target representations, called Noise As Targets (NAT), and to
constrain the deep features to align to them. This domain agnostic approach
avoids the standard unsupervised learning issues of trivial solutions and
collapsing of features. Thanks to a stochastic batch reassignment strategy and
a separable square loss function, it scales to millions of images. The proposed
approach produces representations that perform on par with state-of-the-art
unsupervised methods on ImageNet and Pascal VOC.