no code implementations • 16 Dec 2023 • Ian Magnusson, Akshita Bhagia, Valentin Hofmann, Luca Soldaini, Ananya Harsh Jha, Oyvind Tafjord, Dustin Schwenk, Evan Pete Walsh, Yanai Elazar, Kyle Lo, Dirk Groeneveld, Iz Beltagy, Hannaneh Hajishirzi, Noah A. Smith, Kyle Richardson, Jesse Dodge
Evaluations of language models (LMs) commonly report perplexity on monolithic data held out from training.
no code implementations • 19 Jul 2023 • Hao Peng, Qingqing Cao, Jesse Dodge, Matthew E. Peters, Jared Fernandez, Tom Sherborne, Kyle Lo, Sam Skjonsberg, Emma Strubell, Darrell Plessas, Iz Beltagy, Evan Pete Walsh, Noah A. Smith, Hannaneh Hajishirzi
In response, we introduce Pentathlon, a benchmark for holistic and realistic evaluation of model efficiency.
no code implementations • 24 May 2023 • Ananya Harsh Jha, Tom Sherborne, Evan Pete Walsh, Dirk Groeneveld, Emma Strubell, Iz Beltagy
Large language models (LLMs) enable unparalleled few- and zero-shot reasoning capabilities but at a high computational footprint.