Show Your Work: Improved Reporting of Experimental Results

6 Sep 2019Jesse DodgeSuchin GururanganDallas CardRoy SchwartzNoah A. Smith

Research in natural language processing proceeds, in part, by demonstrating that new models achieve superior performance (e.g., accuracy) on held-out test data, compared to previous results. In this paper, we demonstrate that test-set performance scores alone are insufficient for drawing accurate conclusions about which model performs best... (read more)

PDF Abstract

Evaluation results from the paper


  Submit results from this paper to get state-of-the-art GitHub badges and help community compare results to other papers.