Show Your Work: Improved Reporting of Experimental Results

IJCNLP 2019 Jesse DodgeSuchin GururanganDallas CardRoy SchwartzNoah A. Smith

Research in natural language processing proceeds, in part, by demonstrating that new models achieve superior performance (e.g., accuracy) on held-out test data, compared to previous results. In this paper, we demonstrate that test-set performance scores alone are insufficient for drawing accurate conclusions about which model performs best... (read more)

PDF Abstract

Evaluation Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.