Search Results for author: Paul Mooney

Found 3 papers, 2 papers with code

Position: AI Competitions Provide the Gold Standard for Empirical Rigor in GenAI Evaluation

no code implementations1 May 2025 D. Sculley, Will Cukierski, Phil Culliton, Sohier Dane, Maggie Demkin, Ryan Holbrook, Addison Howard, Paul Mooney, Walter Reade, Megan Risdal, Nate Keating

In this position paper, we observe that empirical evaluation in Generative AI is at a crisis point since traditional ML evaluation and benchmarking strategies are insufficient to meet the needs of evaluating modern GenAI models and systems.

Benchmarking Position

Cannot find the paper you are looking for? You can Submit a new open access paper.