1 code implementation • 24 Feb 2025 • Jan Betley, Daniel Tan, Niels Warncke, Anna Sztyber-Betley, Xuchan Bao, Martín Soto, Nathan Labenz, Owain Evans
In our experiment, a model is finetuned to output insecure code without disclosing this to the user.
1 code implementation • 19 Jan 2025 • Jan Betley, Xuchan Bao, Martín Soto, Anna Sztyber-Betley, James Chua, Owain Evans
Note that while we finetune models to exhibit behaviors like writing insecure code, we do not finetune them to articulate their own behaviors -- models do this without any special training or examples.
1 code implementation • 5 Jul 2024 • Rudolf Laine, Bilal Chughtai, Jan Betley, Kaivalya Hariharan, Jeremy Scheurer, Mikita Balesni, Marius Hobbhahn, Alexander Meinke, Owain Evans
Chat models, which are finetuned to serve as AI assistants, outperform their corresponding base models on SAD but not on general knowledge tasks.
1 code implementation • 20 Jun 2024 • Johannes Treutlein, Dami Choi, Jan Betley, Samuel Marks, Cem Anil, Roger Grosse, Owain Evans
As a step towards answering this question, we study inductive out-of-context reasoning (OOCR), a type of generalization in which LLMs infer latent information from evidence distributed across training documents and apply it to downstream tasks without in-context learning.
1 code implementation • 10 Oct 2023 • Anna Sztyber-Betley, Filip Kołodziej, Jan Betley, Piotr Duszak
Contract bridge is a game characterized by incomplete information, posing an exciting challenge for artificial intelligence methods.