Deep neural networks are learning models with a very high capacity and
therefore prone to over-fitting. Many regularization techniques such as
Dropout, DropConnect, and weight decay all attempt to solve the problem of
over-fitting by reducing the capacity of their respective models (Srivastava et
al., 2014), (Wan et al., 2013), (Krogh & Hertz, 1992)...
In this paper we
introduce a new form of regularization that guides the learning problem in a
way that reduces over-fitting without sacrificing the capacity of the model. The mistakes that models make in early stages of training carry information
about the learning problem. By adjusting the labels of the current epoch of
training through a weighted average of the real labels, and an exponential
average of the past soft-targets we achieved a regularization scheme as
powerful as Dropout without necessarily reducing the capacity of the model, and
simplified the complexity of the learning problem. SoftTarget regularization
proved to be an effective tool in various neural network architectures.