Regularization Cocktails
The regularization of prediction models is arguably the most crucial ingredient that allows Machine Learning solutions to generalize well on unseen data. Several types of regularization are popular in the Deep Learning community (e.g., weight decay, drop-out, early stopping, etc.), but so far these are selected on an adhoc basis, and there is no systematic study as to how different regularizers should be combined into the best "cocktail". In this paper, we fill this gap, by considering cocktails of 13 different regularization methods and framing the question of how to best combine them as a standard hyperparameter optimization problem. We perform a large-scale empirical study on 42 datasets, concluding that regularization cocktails substantially outperform individual regularization methods, even if the hyperparameters of the latter are carefully tuned; the optimal regularization cocktail depends on the dataset; and that regularization cocktails yield a higher gain on small datasets.
PDF Abstract