no code implementations • 26 Jan 2024 • Christine Herlihy, Kimberly Truong, Alexandra Chouldechova, Miroslav Dudik
Disaggregated evaluation is a central task in AI fairness assessment, with the goal to measure an AI system's performance across different subgroups defined by combinations of demographic or other sensitive attributes.
1 code implementation • 16 Oct 2023 • Manley Roberts, Himanshu Thakur, Christine Herlihy, Colin White, Samuel Dooley
Recent claims about the impressive abilities of large language models (LLMs) are often supported by evaluating publicly available benchmarks.
1 code implementation • 9 Dec 2022 • Christine Herlihy, John P. Dickerson
Restless multi-armed bandits are often used to model budget-constrained resource allocation tasks where receipt of the resource is associated with an increased probability of a favorable state transition.
1 code implementation • 14 Jun 2021 • Christine Herlihy, Aviva Prins, Aravind Srinivasan, John P. Dickerson
Restless and collapsing bandits are often used to model budget-constrained resource allocation in settings where arms have action-dependent transition probabilities, such as the allocation of health interventions among patients.
1 code implementation • ACL 2021 • Christine Herlihy, Rachel Rudinger
Crowdworker-constructed natural language inference (NLI) datasets have been found to contain statistical artifacts associated with the annotation process that allow hypothesis-only classifiers to achieve better-than-random performance (Poliak et al., 2018; Gururanganet et al., 2018; Tsuchiya, 2018).