A Surprisingly Robust Trick for Winograd Schema Challenge

15 May 2019Vid KocijanAna-Maria CretuOana-Maria CamburuYordan YordanovThomas Lukasiewicz

The Winograd Schema Challenge (WSC) dataset WSC273 and its inference counterpart WNLI are popular benchmarks for natural language understanding and commonsense reasoning. In this paper, we show that the performance of three language models on WSC273 strongly improves when fine-tuned on a similar pronoun disambiguation problem dataset (denoted WSCR)... (read more)

PDF Abstract
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Natural Language Understanding Wisconsin Sleep Cohort (WSC) BERTWSCR Kocijan et al. (2019) Accuracy 70.3 # 4
Natural Language Understanding WNLI BERTWiki-WSCR Kocijan et al. (2019) Accuracy 71.9 # 2
Natural Language Understanding WNLI BERTWSCR Kocijan et al. (2019) Accuracy 70.5 # 3

Results from Other Papers


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK SOURCE PAPER COMPARE
Natural Language Understanding Wisconsin Sleep Cohort (WSC) BERTWiki-WSCR Kocijan et al. (2019) Accuracy 72.2 # 2

Methods used in the Paper