1 code implementation • 6 Jul 2018 • Arun Tejasvi Chaganty, Stephen Mussman, Percy Liang
For evaluating generation systems, automatic metrics such as BLEU cost nothing to run but have been shown to correlate poorly with human judgment, leading to systematic bias against certain model improvements.