Models of human preference for learning reward functions

no code implementations5 Jun 2022 W. Bradley Knox, Stephane Hatgis-Kessell, Serena Booth, Scott Niekum, Peter Stone, Alessandro Allievi

One promising method for alignment is to learn the reward function from human-generated preferences between pairs of trajectory segments.

Decision Making

The Irrationality of Neural Rationale Models

1 code implementation NAACL (TrustNLP) 2022 Yiming Zheng, Serena Booth, Julie Shah, Yilun Zhou

We call for more rigorous and comprehensive evaluations of these models to ensure desired properties of interpretability are indeed achieved.

Bayes-TrEx: a Bayesian Sampling Approach to Model Transparency by Example

1 code implementation19 Feb 2020 Serena Booth, Yilun Zhou, Ankit Shah, Julie Shah

To address these challenges, we introduce a flexible model inspection framework: Bayes-TrEx.

Domain Adaptation

Sampling Prediction-Matching Examples in Neural Networks: A Probabilistic Programming Approach

no code implementations9 Jan 2020 Serena Booth, Ankit Shah, Yilun Zhou, Julie Shah

In this paper, we consider the problem of exploring the prediction level sets of a classifier using probabilistic programming.

General Classification Probabilistic Programming

