Search Results for author: David Rein

Found 6 papers, 6 papers with code

HCAST: Human-Calibrated Autonomy Software Tasks

1 code implementation21 Mar 2025 David Rein, Joel Becker, Amy Deng, Seraphina Nix, Chris Canal, Daniel O'Connel, Pip Arnott, Ryan Bloom, Thomas Broadley, Katharyn Garcia, Brian Goodrich, Max Hasin, Sami Jawhar, Megan Kinniment, Thomas Kwa, Aron Lajko, Nate Rush, Lucas Jun Koba Sato, Sydney von Arx, Ben West, Lawrence Chan, Elizabeth Barnes

To understand and predict the societal impacts of highly autonomous AI systems, we need benchmarks with grounding, i. e., metrics that directly connect AI performance to real-world effects we care about.

Training Language Models to Win Debates with Self-Play Improves Judge Accuracy

1 code implementation25 Sep 2024 Samuel Arnesen, David Rein, Julian Michael

We test the robustness of debate as a method of scalable oversight by training models to debate with data generated via self-play.

Language Modeling Language Modelling +1

GPQA: A Graduate-Level Google-Proof Q&A Benchmark

2 code implementations20 Nov 2023 David Rein, Betty Li Hou, Asa Cooper Stickland, Jackson Petty, Richard Yuanzhe Pang, Julien Dirani, Julian Michael, Samuel R. Bowman

We present GPQA, a challenging dataset of 448 multiple-choice questions written by domain experts in biology, physics, and chemistry.

Multiple-choice

Debate Helps Supervise Unreliable Experts

1 code implementation15 Nov 2023 Julian Michael, Salsabila Mahdi, David Rein, Jackson Petty, Julien Dirani, Vishakh Padmakumar, Samuel R. Bowman

Comparing debate to a baseline we call consultancy, where a single expert argues for only one answer which is correct half of the time, we find that debate performs significantly better, with 84% judge accuracy compared to consultancy's 74%.

Reading Comprehension

Classification with Strategically Withheld Data

1 code implementation18 Dec 2020 Anilesh K. Krishnaswamy, Haoming Li, David Rein, Hanrui Zhang, Vincent Conitzer

To this end, we present {\sc IC-LR}, a modification of Logistic Regression that removes the incentive to strategically drop features.

Classification General Classification +1

Cannot find the paper you are looking for? You can Submit a new open access paper.