Search Results for author: Jérémy Scheurer

Found 12 papers, 6 papers with code

Towards evaluations-based safety cases for AI scheming

no code implementations29 Oct 2024 Mikita Balesni, Marius Hobbhahn, David Lindner, Alexander Meinke, Tomek Korbak, Joshua Clymer, Buck Shlegeris, Jérémy Scheurer, Charlotte Stix, Rusheb Shah, Nicholas Goldowsky-Dill, Dan Braun, Bilal Chughtai, Owain Evans, Daniel Kokotajlo, Lucius Bushnaq

We sketch how developers of frontier AI systems could construct a structured rationale -- a 'safety case' -- that an AI system is unlikely to cause catastrophic outcomes through scheming.

Analyzing Probabilistic Methods for Evaluating Agent Capabilities

no code implementations24 Sep 2024 Axel Højmark, Govind Pimpale, Arjun Panickssery, Marius Hobbhahn, Jérémy Scheurer

To enhance the accuracy of capability estimates of AI agents on difficult tasks, we suggest future work should leverage the rich literature on Monte Carlo Estimators.

AI Agent

TracrBench: Generating Interpretability Testbeds with Large Language Models

1 code implementation7 Sep 2024 Hannes Thurnherr, Jérémy Scheurer

However, manually creating a large number of models needed for verifying interpretability methods is labour-intensive and time-consuming.

Large Language Models can Strategically Deceive their Users when Put Under Pressure

1 code implementation9 Nov 2023 Jérémy Scheurer, Mikita Balesni, Marius Hobbhahn

We demonstrate a situation in which Large Language Models, trained to be helpful, harmless, and honest, can display misaligned behavior and strategically deceive their users about this behavior without being instructed to do so.

Management

Improving Code Generation by Training with Natural Language Feedback

1 code implementation28 Mar 2023 Angelica Chen, Jérémy Scheurer, Tomasz Korbak, Jon Ander Campos, Jun Shern Chan, Samuel R. Bowman, Kyunghyun Cho, Ethan Perez

The potential for pre-trained large language models (LLMs) to use natural language feedback at inference time has been an exciting recent development.

Code Generation Imitation Learning +1

Few-shot Adaptation Works with UnpredicTable Data

1 code implementation1 Aug 2022 Jun Shern Chan, Michael Pieler, Jonathan Jao, Jérémy Scheurer, Ethan Perez

Finetuning on the resulting dataset leads to improved FSL performance on Natural Language Processing (NLP) tasks, but not proportionally to dataset scale.

Domain Adaptation Few-Shot Learning

Instance-wise algorithm configuration with graph neural networks

1 code implementation10 Feb 2022 Romeo Valentin, Claudio Ferrari, Jérémy Scheurer, Andisheh Amrollahi, Chris Wendler, Max B. Paulus

We pose this task as a supervised learning problem: First, we compile a large dataset of the solver performance for various configurations and all provided MILP instances.

Combinatorial Optimization Graph Neural Network

Cannot find the paper you are looking for? You can Submit a new open access paper.