1 code implementation • 16 Oct 2024 • Claudia Shi, Nicolas Beltran-Velez, Achille Nazaret, Carolina Zheng, Adrià Garriga-Alonso, Andrew Jesson, Maggie Makar, David M. Blei
In this paper, we formalize a set of criteria that a circuit is hypothesized to meet and develop a suite of hypothesis tests to evaluate how well circuits satisfy them.
no code implementations • NeurIPS 2023 • Amir Feder, Yoav Wald, Claudia Shi, Suchi Saria, David Blei
The reliance of text classifiers on spurious correlations can lead to poor generalization at deployment, raising concerns about their use in safety-critical domains such as healthcare.
no code implementations • 27 Jul 2023 • Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Wang, Samuel Marks, Charbel-Raphaël Segerie, Micah Carroll, Andi Peng, Phillip Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. Michaud, Jacob Pfau, Dmitrii Krasheninnikov, Xin Chen, Lauro Langosco, Peter Hase, Erdem Biyik, Anca Dragan, David Krueger, Dorsa Sadigh, Dylan Hadfield-Menell
Reinforcement learning from human feedback (RLHF) is a technique for training AI systems to align with human goals.
1 code implementation • NeurIPS 2023 • Nino Scherrer, Claudia Shi, Amir Feder, David M. Blei
(2) We apply this method to study what moral beliefs are encoded in different LLMs, especially in ambiguous cases where the right choice is not obvious.
1 code implementation • 31 May 2023 • Carolina Zheng, Claudia Shi, Keyon Vafa, Amir Feder, David M. Blei
In this paper, we show that the performance of controlled generation may be poor if the distributions of text in response to user prompts differ from the distribution the predictor was trained on.
1 code implementation • 24 Nov 2020 • Claudia Shi, Victor Veitch, David Blei
To address this challenge, practitioners collect and adjust for the covariates, hoping that they adequately correct for confounding.
5 code implementations • NeurIPS 2019 • Claudia Shi, David M. Blei, Victor Veitch
We propose two adaptations based on insights from the statistical literature on the estimation of treatment effects.
Ranked #2 on Causal Inference on IHDP