Search Results for author: Davis Yoshida

Found 6 papers, 1 papers with code

MAP's not dead yet: Uncovering true language model modes by conditioning away degeneracy

no code implementations15 Nov 2023 Davis Yoshida, Kartik Goyal, Kevin Gimpel

It has been widely observed that exact or approximate MAP (mode-seeking) decoding from natural language generation (NLG) models consistently leads to degenerate outputs (Stahlberg and Byrne, 2019, Holtzman et al., 2019).

Instruction Following Language Modelling +2

NF4 Isn't Information Theoretically Optimal (and that's Good)

1 code implementation12 Jun 2023 Davis Yoshida

This note shares some simple calculations and experiments related to absmax-based blockwise quantization, as used in Dettmers et al., 2023.

Quantization

Reconsidering the Past: Optimizing Hidden States in Language Models

no code implementations Findings (EMNLP) 2021 Davis Yoshida, Kevin Gimpel

We present Hidden-State Optimization (HSO), a gradient-based method for improving the performance of transformer language models at inference time.

Language Modelling

Adding Recurrence to Pretrained Transformers

no code implementations1 Jan 2021 Davis Yoshida, Allyson Ettinger, Kevin Gimpel

Fine-tuning a pretrained transformer for a downstream task has become a standard method in NLP in the last few years.

Language Modelling

Adding Recurrence to Pretrained Transformers for Improved Efficiency and Context Size

no code implementations16 Aug 2020 Davis Yoshida, Allyson Ettinger, Kevin Gimpel

Fine-tuning a pretrained transformer for a downstream task has become a standard method in NLP in the last few years.

Language Modelling

Cannot find the paper you are looking for? You can Submit a new open access paper.