1 code implementation • 17 Nov 2024 • William Huang, Yifeng Jiang, Tom Van Wouwe, C. Karen Liu
Diffusion models have demonstrated significant promise in various generative tasks; however, they often struggle to satisfy challenging constraints.
1 code implementation • 25 Apr 2024 • William Huang, Sam Ghahremani, Siyou Pei, Yang Zhang
We present a data synthesis pipeline to address this disparity in data collection and subsequently improve pose estimation performance for wheelchair users.
no code implementations • NAACL (DADC) 2022 • Jason Phang, Angelica Chen, William Huang, Samuel R. Bowman
We find that AFLite indeed selects more challenging examples, lowering the performance of evaluated models more as stronger adversary models are used.
1 code implementation • EMNLP 2021 • Udit Arora, William Huang, He He
Despite agreement on the importance of detecting out-of-distribution (OOD) examples, there is little consensus on the formal definition of OOD examples and how to best detect them.
no code implementations • ACL 2021 • Clara Vania, Phu Mon Htut, William Huang, Dhara Mungra, Richard Yuanzhe Pang, Jason Phang, Haokun Liu, Kyunghyun Cho, Samuel R. Bowman
Recent years have seen numerous NLP datasets introduced to evaluate the performance of fine-tuned models on natural language understanding tasks.
no code implementations • Findings (EMNLP) 2021 • Alicia Parrish, William Huang, Omar Agha, Soo-Hwan Lee, Nikita Nangia, Alex Warstadt, Karmanya Aggarwal, Emily Allaway, Tal Linzen, Samuel R. Bowman
We take natural language inference as a test case and ask whether it is beneficial to put a linguist `in the loop' during data collection to dynamically identify and address gaps in the data by introducing novel constraints on the task.
1 code implementation • EMNLP (insights) 2020 • William Huang, Haokun Liu, Samuel R. Bowman
A growing body of work shows that models exploit annotation artifacts to achieve state-of-the-art performance on standard crowdsourced benchmarks---datasets collected from crowdworkers to create an evaluation task---while still failing on out-of-domain examples for the same task.
1 code implementation • EMNLP 2020 • Haokun Liu, William Huang, Dhara A. Mungra, Samuel R. Bowman
Performance on the Winograd Schema Challenge (WSC), a respected English commonsense reasoning benchmark, recently rocketed from chance accuracy to 89% on the SuperGLUE leaderboard, with relatively little corroborating evidence of a correspondingly large improvement in reasoning ability.