no code implementations • 5 Jun 2022 • W. Bradley Knox, Stephane Hatgis-Kessell, Serena Booth, Scott Niekum, Peter Stone, Alessandro Allievi
One promising method for alignment is to learn the reward function from human-generated preferences between pairs of trajectory segments.
1 code implementation • NAACL (TrustNLP) 2022 • Yiming Zheng, Serena Booth, Julie Shah, Yilun Zhou
We call for more rigorous and comprehensive evaluations of these models to ensure desired properties of interpretability are indeed achieved.
no code implementations • 6 Oct 2021 • Aspen Hopkins, Serena Booth
Practitioners from diverse occupations and backgrounds are increasingly using machine learning (ML) methods.
1 code implementation • 27 Apr 2021 • Yilun Zhou, Serena Booth, Marco Tulio Ribeiro, Julie Shah
Feature attribution methods are exceedingly popular in interpretable machine learning.
1 code implementation • 19 Feb 2020 • Serena Booth, Yilun Zhou, Ankit Shah, Julie Shah
To address these challenges, we introduce a flexible model inspection framework: Bayes-TrEx.
no code implementations • 9 Jan 2020 • Serena Booth, Ankit Shah, Yilun Zhou, Julie Shah
In this paper, we consider the problem of exploring the prediction level sets of a classifier using probabilistic programming.