Context-Aware Temperature for Language Modeling

1 Jan 2021 · Pei-Hsin Wang, Sheng-Iou Hsieh, Shih-Chieh Chang, Yu-Ting Chen, Da-Cheng Juan, Jia-Yu Pan, Wei Wei ·

Current practices to apply temperature scaling assume either a fixed, or a manually-crafted dynamically changing schedule. However, our studies indicate that the individual optimal trajectory for each class can change with the context. To this end, we propose context-aware temperature, a generalized approach to provide an individual optimal temperature trajectory over the context for each vocabulary, while allowing the temperature to be learned along with the remaining model parameters during training. Experiment results confirm that the proposed method significantly improves state-of-the-art language models, achieving a perplexity of 19.90 on Penn Treebank, 33.88 on WikiText-2, and 4.7 on WikiText-103.

PDF Abstract