no code implementations • 19 Sep 2023 • Jordan Voas, Yili Wang, QiXing Huang, Raymond Mooney
Our findings indicate that none of the metrics currently used for this task show even a moderate correlation with human judgments on a sample level.