no code implementations • 15 Nov 2023 • Davis Yoshida, Kartik Goyal, Kevin Gimpel
It has been widely observed that exact or approximate MAP (mode-seeking) decoding from natural language generation (NLG) models consistently leads to degenerate outputs (Stahlberg and Byrne, 2019, Holtzman et al., 2019).
1 code implementation • 12 Jun 2023 • Davis Yoshida
This note shares some simple calculations and experiments related to absmax-based blockwise quantization, as used in Dettmers et al., 2023.
no code implementations • Findings (EMNLP) 2021 • Davis Yoshida, Kevin Gimpel
We present Hidden-State Optimization (HSO), a gradient-based method for improving the performance of transformer language models at inference time.
no code implementations • 1 Jan 2021 • Davis Yoshida, Allyson Ettinger, Kevin Gimpel
Fine-tuning a pretrained transformer for a downstream task has become a standard method in NLP in the last few years.
no code implementations • 16 Aug 2020 • Davis Yoshida, Allyson Ettinger, Kevin Gimpel
Fine-tuning a pretrained transformer for a downstream task has become a standard method in NLP in the last few years.