Search Results for author: Jing Han Sun

Found 1 papers, 0 papers with code

EvoGrad: A Dynamic Take on the Winograd Schema Challenge with Human Adversaries

no code implementations • 20 Feb 2024 • Jing Han Sun, Ali Emami

Our results emphasize the challenge posed by EvoGrad: Even the best performing LLM, GPT-3. 5, achieves an accuracy of 65. 0% with an average error depth of 7. 2, a stark contrast to human performance of 92.

Common Sense Reasoning coreference-resolution

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.