1 code implementation • 23 Sep 2024 • Benjamin Feuer, Micah Goldblum, Teresa Datta, Sanjana Nambiar, Raz Besaleli, Samuel Dooley, Max Cembalest, John P. Dickerson
In this work, we attempt to answer the following question -- do LLM-judge preferences translate to progress on other, more concrete metrics for alignment, and if not, why not?
no code implementations • 23 Mar 2023 • Avi Schwarzschild, Max Cembalest, Karthik Rao, Keegan Hines, John Dickerson
We observe on three datasets that we can train a model with this loss term to improve explanation consensus on unseen data, and see improved consensus between explainers other than those used in the loss term.
no code implementations • 14 Dec 2022 • Teresa Datta, Daniel Nissani, Max Cembalest, Akash Khanna, Haley Massa, John P. Dickerson
Motivated by mitigating potentially harmful impacts of technologies, the AI community has formulated and accepted mathematical definitions for certain pillars of accountability: e. g. privacy, fairness, and model transparency.