Search Results for author: Akbir Khan

Found 14 papers, 6 papers with code

Factorio Learning Environment

1 code implementation6 Mar 2025 Jack Hopkins, Mart Bakler, Akbir Khan

Large Language Models (LLMs) are rapidly saturating existing benchmarks, necessitating new open-ended evaluations.

Program Synthesis Spatial Reasoning

Alignment faking in large language models

1 code implementation18 Dec 2024 Ryan Greenblatt, Carson Denison, Benjamin Wright, Fabien Roger, Monte MacDiarmid, Sam Marks, Johannes Treutlein, Tim Belonax, Jack Chen, David Duvenaud, Akbir Khan, Julian Michael, Sören Mindermann, Ethan Perez, Linda Petrini, Jonathan Uesato, Jared Kaplan, Buck Shlegeris, Samuel R. Bowman, Evan Hubinger

Explaining this gap, in almost all cases where the model complies with a harmful query from a free user, we observe explicit alignment-faking reasoning, with the model stating it is strategically answering harmful queries in training to preserve its preferred harmlessness behavior out of training.

Large Language Model

Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats

no code implementations26 Nov 2024 Jiaxin Wen, Vivek Hebbar, Caleb Larson, Aryan Bhatt, Ansh Radhakrishnan, Mrinank Sharma, Henry Sleight, Shi Feng, He He, Ethan Perez, Buck Shlegeris, Akbir Khan

As large language models (LLMs) become increasingly capable, it is prudent to assess whether safety measures remain effective even if LLMs intentionally try to bypass them.

Code Generation

BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games

no code implementations20 Nov 2024 Davide Paglieri, Bartłomiej Cupiał, Samuel Coward, Ulyana Piterbarg, Maciej Wolczyk, Akbir Khan, Eduardo Pignatelli, Łukasz Kuciński, Lerrel Pinto, Rob Fergus, Jakob Nicolaus Foerster, Jack Parker-Holder, Tim Rocktäschel

Large Language Models (LLMs) and Vision Language Models (VLMs) possess extensive knowledge and exhibit promising reasoning abilities; however, they still struggle to perform well in complex, dynamic environments.

Benchmarking NetHack +1

Language Models Learn to Mislead Humans via RLHF

1 code implementation19 Sep 2024 Jiaxin Wen, Ruiqi Zhong, Akbir Khan, Ethan Perez, Jacob Steinhardt, Minlie Huang, Samuel R. Bowman, He He, Shi Feng

On a question-answering task (QuALITY) and programming task (APPS), RLHF makes LMs better at convincing our subjects but not at completing the task correctly.

Question Answering

Scaling Opponent Shaping to High Dimensional Games

no code implementations19 Dec 2023 Akbir Khan, Timon Willi, Newton Kwan, Andrea Tacchetti, Chris Lu, Edward Grefenstette, Tim Rocktäschel, Jakob Foerster

In multi-agent settings with mixed incentives, methods developed for zero-sum games have been shown to lead to detrimental outcomes.

Meta-Learning

Leading the Pack: N-player Opponent Shaping

no code implementations19 Dec 2023 Alexandra Souly, Timon Willi, Akbir Khan, Robert Kirk, Chris Lu, Edward Grefenstette, Tim Rocktäschel

We evaluate on over 4 different environments, varying the number of players from 3 to 5, and demonstrate that model-based OS methods converge to equilibrium with better global welfare than naive learning.

MAESTRO: Open-Ended Environment Design for Multi-Agent Reinforcement Learning

no code implementations6 Mar 2023 Mikayel Samvelyan, Akbir Khan, Michael Dennis, Minqi Jiang, Jack Parker-Holder, Jakob Foerster, Roberta Raileanu, Tim Rocktäschel

Open-ended learning methods that automatically generate a curriculum of increasingly challenging tasks serve as a promising avenue toward generally capable reinforcement learning agents.

continuous-control Continuous Control +4

The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs

1 code implementation NeurIPS 2023 Laura Ruis, Akbir Khan, Stella Biderman, Sara Hooker, Tim Rocktäschel, Edward Grefenstette

We present our findings as the starting point for further research into evaluating how LLMs interpret language in context and to drive the development of more pragmatic and useful models of human discourse.

Considering Race a Problem of Transfer Learning

no code implementations12 Dec 2018 Akbir Khan, Marwa Mahmoud

We take the novel approach of considering race as a boundary for transfer learning in both the task (facial classification) and the domain (synthesis over distinct datasets).

Classification General Classification +2

Cannot find the paper you are looking for? You can Submit a new open access paper.