Search Results for author: Mrinank Sharma

Found 9 papers, 5 papers with code

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

1 code implementation • 10 Jan 2024 • Evan Hubinger, Carson Denison, Jesse Mu, Mike Lambert, Meg Tong, Monte MacDiarmid, Tamera Lanham, Daniel M. Ziegler, Tim Maxwell, Newton Cheng, Adam Jermyn, Amanda Askell, Ansh Radhakrishnan, Cem Anil, David Duvenaud, Deep Ganguli, Fazl Barez, Jack Clark, Kamal Ndousse, Kshitij Sachan, Michael Sellitto, Mrinank Sharma, Nova DasSarma, Roger Grosse, Shauna Kravec, Yuntao Bai, Zachary Witten, Marina Favaro, Jan Brauner, Holden Karnofsky, Paul Christiano, Samuel R. Bowman, Logan Graham, Jared Kaplan, Sören Mindermann, Ryan Greenblatt, Buck Shlegeris, Nicholas Schiefer, Ethan Perez

We find that such backdoor behavior can be made persistent, so that it is not removed by standard safety training techniques, including supervised fine-tuning, reinforcement learning, and adversarial training (eliciting unsafe behavior and then training to remove it).

Paper
Code

Towards Understanding Sycophancy in Language Models

1 code implementation • 20 Oct 2023 • Mrinank Sharma, Meg Tong, Tomasz Korbak, David Duvenaud, Amanda Askell, Samuel R. Bowman, Newton Cheng, Esin Durmus, Zac Hatfield-Dodds, Scott R. Johnston, Shauna Kravec, Timothy Maxwell, Sam McCandlish, Kamal Ndousse, Oliver Rausch, Nicholas Schiefer, Da Yan, Miranda Zhang, Ethan Perez

Overall, our results indicate that sycophancy is a general behavior of state-of-the-art AI assistants, likely driven in part by human preference judgments favoring sycophantic responses.

Text Generation

Paper
Code

Understanding and Controlling a Maze-Solving Policy Network

no code implementations • 12 Oct 2023 • Ulisse Mini, Peli Grietzer, Mrinank Sharma, Austin Meek, Monte MacDiarmid, Alexander Matt Turner

To understand the goals and goal representations of AI systems, we carefully study a pretrained reinforcement learning policy that solves mazes by navigating to a range of target squares.

Paper
Add Code

Incorporating Unlabelled Data into Bayesian Neural Networks

no code implementations • 4 Apr 2023 • Mrinank Sharma, Tom Rainforth, Yee Whye Teh, Vincent Fortuin

Conventional Bayesian Neural Networks (BNNs) cannot leverage unlabelled data to improve their predictions.

Active Learning Self-Supervised Learning +1

Paper
Add Code

Do Bayesian Neural Networks Need To Be Fully Stochastic?

2 code implementations • 11 Nov 2022 • Mrinank Sharma, Sebastian Farquhar, Eric Nalisnick, Tom Rainforth

We investigate the benefit of treating all the parameters in a Bayesian neural network stochastically and find compelling theoretical and empirical evidence that this standard construction may be unnecessary.

Paper
Code

Prioritized Training on Points that are Learnable, Worth Learning, and Not Yet Learnt

1 code implementation • 14 Jun 2022 • Sören Mindermann, Jan Brauner, Muhammed Razzak, Mrinank Sharma, Andreas Kirsch, Winnie Xu, Benedikt Höltgen, Aidan N. Gomez, Adrien Morisot, Sebastian Farquhar, Yarin Gal

But most computation and time is wasted on redundant and noisy points that are already learnt or not learnable.

175

Paper
Code

Prioritized training on points that are learnable, worth learning, and not yet learned (workshop version)

no code implementations • 6 Jul 2021 • Sören Mindermann, Muhammed Razzak, Winnie Xu, Andreas Kirsch, Mrinank Sharma, Adrien Morisot, Aidan N. Gomez, Sebastian Farquhar, Jan Brauner, Yarin Gal

We introduce Goldilocks Selection, a technique for faster model training which selects a sequence of training points that are "just right".

Active Learning

Paper
Add Code

How Robust are the Estimated Effects of Nonpharmaceutical Interventions against COVID-19?

no code implementations • NeurIPS 2020 • Mrinank Sharma, Sören Mindermann, Jan Markus Brauner, Gavin Leech, Anna B. Stephenson, Tomáš Gavenčiak, Jan Kulveit, Yee Whye Teh, Leonid Chindelevitch, Yarin Gal

To what extent are effectiveness estimates of nonpharmaceutical interventions (NPIs) against COVID-19 influenced by the assumptions our models make?

Paper
Add Code

Differentially Private Federated Variational Inference

1 code implementation • 24 Nov 2019 • Mrinank Sharma, Michael Hutchinson, Siddharth Swaroop, Antti Honkela, Richard E. Turner

This setting is known as federated learning, in which privacy is a key concern.

Bayesian Inference Federated Learning +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.