Learning Better Structured Representations Using Low-rank Adaptive Label Smoothing

ICLR 2021 · Asish Ghoshal, Xilun Chen, Sonal Gupta, Luke Zettlemoyer, Yashar Mehdad ·

Training with soft targets instead of hard targets has been shown to improve performance and calibration of deep neural networks. Label smoothing is popular way of computing soft targets, where one-hot encoding of a class is smoothed with a uniform distribution. Owing to its simplicity, it has found wide-spread use for training deep neural networks on a wide variety of tasks, ranging from image and text classification to machine translation and semantic parsing. Complementing recent empirical justification for label smoothing, we obtain PAC-Bayesian generalization bounds for label smoothing and show that the generalization error depends on choice of the noise (smoothing) distribution. Then we propose low-rank adaptive label smoothing (LORAS): a simple yet novel method for training with learned soft targets that generalizes label smoothing and adapts to the latent structure of the label space in structured prediction tasks. Specifically, we evaluate our method on task-oriented semantic parsing tasks and show that just by training with appropriately smoothed soft targets, one can improve the accuracy of models by as much as 2% and reduce calibration error by 55% as compared to vanilla label smoothing. Used in conjunction with pre-trained sequence-to-sequence models, our method achieves state of the art performance on three semantic parsing data sets. LORAS can be used with any model, improves performance and implicit model calibration without increasing the number of model parameters, and can be scaled to problems with large label spaces containing tens of thousands of labels.

PDF Abstract