Search Results for author: Samuel Marks

Found 6 papers, 5 papers with code

Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data

1 code implementation20 Jun 2024 Johannes Treutlein, Dami Choi, Jan Betley, Cem Anil, Samuel Marks, Roger Baker Grosse, Owain Evans

As a step towards answering this question, we study inductive out-of-context reasoning (OOCR), a type of generalization in which LLMs infer latent information from evidence distributed across training documents and apply it to downstream tasks without in-context learning.

In-Context Learning

Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models

1 code implementation14 Jun 2024 Carson Denison, Monte MacDiarmid, Fazl Barez, David Duvenaud, Shauna Kravec, Samuel Marks, Nicholas Schiefer, Ryan Soklaski, Alex Tamkin, Jared Kaplan, Buck Shlegeris, Samuel R. Bowman, Ethan Perez, Evan Hubinger

We construct a curriculum of increasingly sophisticated gameable environments and find that training on early-curriculum environments leads to more specification gaming on remaining environments.

Language Modelling Large Language Model

The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets

1 code implementation10 Oct 2023 Samuel Marks, Max Tegmark

In this work, we curate high-quality datasets of true/false statements and use them to study in detail the structure of LLM representations of truth, drawing on three lines of evidence: 1.

Language Modelling Large Language Model

Cannot find the paper you are looking for? You can Submit a new open access paper.