Search Results for author: Andrew Poulton

Found 4 papers, 2 papers with code

Quantifying Variance in Evaluation Benchmarks

no code implementations14 Jun 2024 Lovish Madaan, Aaditya K. Singh, Rylan Schaeffer, Andrew Poulton, Sanmi Koyejo, Pontus Stenetorp, Sharan Narang, Dieuwke Hupkes

Evaluation benchmarks are the cornerstone of measuring capabilities of large language models (LLMs), as well as driving progress in said capabilities.

Cannot find the paper you are looking for? You can Submit a new open access paper.