On the implicit minimization of alternative loss functions when training deep networks

25 Sep 2019 · Alexandre Lemire Paquin, Brahim Chaib-Draa, Philippe Giguère ·

Understanding the implicit bias of optimization algorithms is important in order to improve generalization of neural networks. One approach to try to exploit such understanding would be to then make the bias explicit in the loss function. Conversely, an interesting approach to gain more insights into the implicit bias could be to study how different loss functions are being implicitly minimized when training the network. In this work, we concentrate our study on the inductive bias occurring when minimizing the cross-entropy loss with different batch sizes and learning rates. We investigate how three loss functions are being implicitly minimized during training. These three loss functions are the Hinge loss with different margins, the cross-entropy loss with different temperatures and a newly introduced Gcdf loss with different standard deviations. This Gcdf loss establishes a connection between a sharpness measure for the 0−1 loss and margin based loss functions. We find that a common behavior is emerging for all the loss functions considered.

PDF Abstract