Search Results for author: Max Cembalest

Found 3 papers, 1 papers with code

Style Outweighs Substance: Failure Modes of LLM Judges in Alignment Benchmarking

1 code implementation23 Sep 2024 Benjamin Feuer, Micah Goldblum, Teresa Datta, Sanjana Nambiar, Raz Besaleli, Samuel Dooley, Max Cembalest, John P. Dickerson

In this work, we attempt to answer the following question -- do LLM-judge preferences translate to progress on other, more concrete metrics for alignment, and if not, why not?

Benchmarking Diversity +2

Reckoning with the Disagreement Problem: Explanation Consensus as a Training Objective

no code implementations23 Mar 2023 Avi Schwarzschild, Max Cembalest, Karthik Rao, Keegan Hines, John Dickerson

We observe on three datasets that we can train a model with this loss term to improve explanation consensus on unseen data, and see improved consensus between explainers other than those used in the loss term.

Tensions Between the Proxies of Human Values in AI

no code implementations14 Dec 2022 Teresa Datta, Daniel Nissani, Max Cembalest, Akash Khanna, Haley Massa, John P. Dickerson

Motivated by mitigating potentially harmful impacts of technologies, the AI community has formulated and accepted mathematical definitions for certain pillars of accountability: e. g. privacy, fairness, and model transparency.

Fairness

Cannot find the paper you are looking for? You can Submit a new open access paper.