Search Results for author: William Huang

Found 6 papers, 3 papers with code

Adversarially Constructed Evaluation Sets Are More Challenging, but May Not Be Fair

no code implementations • NAACL (DADC) 2022 • Jason Phang, Angelica Chen, William Huang, Samuel R. Bowman

We find that AFLite indeed selects more challenging examples, lowering the performance of evaluated models more as stronger adversary models are used.

Paper
Add Code

Types of Out-of-Distribution Texts and How to Detect Them

1 code implementation • EMNLP 2021 • Udit Arora, William Huang, He He

Despite agreement on the importance of detecting out-of-distribution (OOD) examples, there is little consensus on the formal definition of OOD examples and how to best detect them.

Density Estimation Language Modelling +2

Paper
Code

Comparing Test Sets with Item Response Theory

no code implementations • ACL 2021 • Clara Vania, Phu Mon Htut, William Huang, Dhara Mungra, Richard Yuanzhe Pang, Jason Phang, Haokun Liu, Kyunghyun Cho, Samuel R. Bowman

Recent years have seen numerous NLP datasets introduced to evaluate the performance of fine-tuned models on natural language understanding tasks.

Natural Language Understanding

Paper
Add Code

Does Putting a Linguist in the Loop Improve NLU Data Collection?

no code implementations • Findings (EMNLP) 2021 • Alicia Parrish, William Huang, Omar Agha, Soo-Hwan Lee, Nikita Nangia, Alex Warstadt, Karmanya Aggarwal, Emily Allaway, Tal Linzen, Samuel R. Bowman

We take natural language inference as a test case and ask whether it is beneficial to put a linguist `in the loop' during data collection to dynamically identify and address gaps in the data by introducing novel constraints on the task.

Natural Language Inference

Paper
Add Code

Counterfactually-Augmented SNLI Training Data Does Not Yield Better Generalization Than Unaugmented Data

1 code implementation • EMNLP (insights) 2020 • William Huang, Haokun Liu, Samuel R. Bowman

A growing body of work shows that models exploit annotation artifacts to achieve state-of-the-art performance on standard crowdsourced benchmarks---datasets collected from crowdworkers to create an evaluation task---while still failing on out-of-domain examples for the same task.

counterfactual Natural Language Inference +2

Paper
Code

Precise Task Formalization Matters in Winograd Schema Evaluations

1 code implementation • EMNLP 2020 • Haokun Liu, William Huang, Dhara A. Mungra, Samuel R. Bowman

Performance on the Winograd Schema Challenge (WSC), a respected English commonsense reasoning benchmark, recently rocketed from chance accuracy to 89% on the SuperGLUE leaderboard, with relatively little corroborating evidence of a correspondingly large improvement in reasoning ability.

Language Modelling Multiple-choice

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.