The Sensitivity of Language Models and Humans to Winograd Schema Perturbations

Large-scale pretrained language models are the major driving force behind recent improvements in performance on the Winograd Schema Challenge, a widely employed test of common sense reasoning ability. We show, however, with a new diagnostic dataset, that these models are sensitive to linguistic perturbations of the Winograd examples that minimally affect human understanding... (read more)

Results in Papers With Code
(↓ scroll down to see all results)