no code implementations • 20 Feb 2024 • Kristian Lum, Jacy Reese Anthis, Chirag Nagpal, Alexander D'Amour
In this work, we study the correspondence between such decontextualized "trick tests" and evaluations that are more grounded in Realistic Use and Tangible {Effects (i. e. RUTEd evaluations).
1 code implementation • NeurIPS 2023 • Jacy Reese Anthis, Victor Veitch
This is an intuitive standard, as reflected in the U. S. legal system, but its use is limited because counterfactuals cannot be directly observed in real-world data.