1 code implementation • ACL 2021 • Christine Herlihy, Rachel Rudinger
Crowdworker-constructed natural language inference (NLI) datasets have been found to contain statistical artifacts associated with the annotation process that allow hypothesis-only classifiers to achieve better-than-random performance (Poliak et al., 2018; Gururanganet et al., 2018; Tsuchiya, 2018).
1 code implementation • 14 Jun 2021 • Christine Herlihy, Aviva Prins, Aravind Srinivasan, John P. Dickerson
Restless and collapsing bandits are often used to model budget-constrained resource allocation in settings where arms have action-dependent transition probabilities, such as the allocation of health interventions among patients.
1 code implementation • 9 Dec 2022 • Christine Herlihy, John P. Dickerson
Restless multi-armed bandits are often used to model budget-constrained resource allocation tasks where receipt of the resource is associated with an increased probability of a favorable state transition.
1 code implementation • 16 Oct 2023 • Manley Roberts, Himanshu Thakur, Christine Herlihy, Colin White, Samuel Dooley
Recent claims about the impressive abilities of large language models (LLMs) are often supported by evaluating publicly available benchmarks.
no code implementations • 26 Jan 2024 • Christine Herlihy, Kimberly Truong, Alexandra Chouldechova, Miroslav Dudik
Disaggregated evaluation is a central task in AI fairness assessment, with the goal to measure an AI system's performance across different subgroups defined by combinations of demographic or other sensitive attributes.