no code implementations • 21 Feb 2025 • Lisa Schut, Yarin Gal, Sebastian Farquhar
Large language models (LLMs) have multilingual capabilities and can solve tasks across various languages.
no code implementations • 22 Jan 2025 • Sebastian Farquhar, Vikrant Varma, David Lindner, David Elson, Caleb Biddulph, Ian Goodfellow, Rohin Shah
Future advanced AI systems may learn sophisticated strategies through reinforcement learning (RL) that humans cannot understand well enough to safely evaluate.
no code implementations • 22 Apr 2024 • Laura Weidinger, Joslyn Barnhart, Jenny Brennan, Christina Butterfield, Susie Young, Will Hawkins, Lisa Anne Hendricks, Ramona Comanescu, Oscar Chang, Mikel Rodriguez, Jennifer Beroshi, Dawn Bloxwich, Lev Proleev, Jilin Chen, Sebastian Farquhar, Lewis Ho, Iason Gabriel, Allan Dafoe, William Isaac
In the development of Google DeepMind's advanced AI models, we innovated on and applied a broad set of approaches to safety evaluation.
1 code implementation • 20 Mar 2024 • Mary Phuong, Matthew Aitchison, Elliot Catt, Sarah Cogan, Alexandre Kaskasoli, Victoria Krakovna, David Lindner, Matthew Rahtz, Yannis Assael, Sarah Hodkinson, Heidi Howard, Tom Lieberum, Ramana Kumar, Maria Abi Raad, Albert Webson, Lewis Ho, Sharon Lin, Sebastian Farquhar, Marcus Hutter, Gregoire Deletang, Anian Ruoss, Seliem El-Sayed, Sasha Brown, Anca Dragan, Rohin Shah, Allan Dafoe, Toby Shevlane
To understand the risks posed by a new AI system, we must understand what it can and cannot do.
no code implementations • 15 Dec 2023 • Sebastian Farquhar, Vikrant Varma, Zachary Kenton, Johannes Gasteiger, Vladimir Mikulik, Rohin Shah
We show that existing unsupervised methods on large language model (LLM) activations do not discover knowledge -- instead they seem to discover whatever feature of the activations is most prominent.
no code implementations • 24 May 2023 • Toby Shevlane, Sebastian Farquhar, Ben Garfinkel, Mary Phuong, Jess Whittlestone, Jade Leung, Daniel Kokotajlo, Nahema Marchal, Markus Anderljung, Noam Kolt, Lewis Ho, Divya Siddarth, Shahar Avin, Will Hawkins, Been Kim, Iason Gabriel, Vijay Bolina, Jack Clark, Yoshua Bengio, Paul Christiano, Allan Dafoe
Current approaches to building general-purpose AI systems tend to produce systems with both beneficial and harmful capabilities.
1 code implementation • 17 Apr 2023 • Freddie Bickford Smith, Andreas Kirsch, Sebastian Farquhar, Yarin Gal, Adam Foster, Tom Rainforth
Information-theoretic approaches to active learning have traditionally focused on maximising the information gathered about the model parameters, most commonly by optimising the BALD score.
3 code implementations • 19 Feb 2023 • Lorenz Kuhn, Yarin Gal, Sebastian Farquhar
We introduce a method to measure uncertainty in large language models.
1 code implementation • NeurIPS 2023 • David Lindner, János Kramár, Sebastian Farquhar, Matthew Rahtz, Thomas McGrath, Vladimir Mikulik
Additionally, the known structure of Tracr-compiled models can serve as ground-truth for evaluating interpretability methods.
no code implementations • 15 Dec 2022 • Lorenz Kuhn, Yarin Gal, Sebastian Farquhar
Users often ask dialogue systems ambiguous questions that require clarification.
2 code implementations • 11 Nov 2022 • Mrinank Sharma, Sebastian Farquhar, Eric Nalisnick, Tom Rainforth
We investigate the benefit of treating all the parameters in a Bayesian neural network stochastically and find compelling theoretical and empirical evidence that this standard construction may be unnecessary.
no code implementations • 11 Nov 2022 • Sebastian Farquhar
To assess a model's ability to incorporate different parts of the Bayesian framework we can identify desirable characteristic behaviours of Bayesian reasoning and pick decision-problems that make heavy use of those behaviours.
no code implementations • 17 Aug 2022 • Zachary Kenton, Ramana Kumar, Sebastian Farquhar, Jonathan Richens, Matt MacDermott, Tom Everitt
Causal models of agents have been used to analyse the safety aspects of machine learning systems.
2 code implementations • 14 Jun 2022 • Sören Mindermann, Jan Brauner, Muhammed Razzak, Mrinank Sharma, Andreas Kirsch, Winnie Xu, Benedikt Höltgen, Aidan N. Gomez, Adrien Morisot, Sebastian Farquhar, Yarin Gal
But most computation and time is wasted on redundant and noisy points that are already learnt or not learnable.
no code implementations • 21 Apr 2022 • Sebastian Farquhar, Ryan Carey, Tom Everitt
We then train agents to maximize the causal effect of actions on the expected return which is not mediated by the delicate parts of state, using Causal Influence Diagram analysis.
1 code implementation • ICLR 2022 • Milad Alizadeh, Shyam A. Tailor, Luisa M Zintgraf, Joost van Amersfoort, Sebastian Farquhar, Nicholas Donald Lane, Yarin Gal
Pruning neural networks at initialization would enable us to find sparse models that retain the accuracy of the original network while consuming fewer computational resources for training and inference.
1 code implementation • 14 Feb 2022 • Jannik Kossen, Sebastian Farquhar, Yarin Gal, Tom Rainforth
We propose Active Surrogate Estimators (ASEs), a new method for label-efficient model evaluation.
no code implementations • 6 Jul 2021 • Sören Mindermann, Muhammed Razzak, Winnie Xu, Andreas Kirsch, Mrinank Sharma, Adrien Morisot, Aidan N. Gomez, Sebastian Farquhar, Jan Brauner, Yarin Gal
We introduce Goldilocks Selection, a technique for faster model training which selects a sequence of training points that are "just right".
2 code implementations • 22 Jun 2021 • Andreas Kirsch, Sebastian Farquhar, Parmida Atighehchian, Andrew Jesson, Frederic Branchaud-Charron, Yarin Gal
We examine a simple stochastic strategy for adapting well-known single-point acquisition functions to allow batch active learning.
3 code implementations • 7 Jun 2021 • Zachary Nado, Neil Band, Mark Collier, Josip Djolonga, Michael W. Dusenberry, Sebastian Farquhar, Qixuan Feng, Angelos Filos, Marton Havasi, Rodolphe Jenatton, Ghassen Jerfel, Jeremiah Liu, Zelda Mariet, Jeremy Nixon, Shreyas Padhy, Jie Ren, Tim G. J. Rudner, Faris Sbahi, Yeming Wen, Florian Wenzel, Kevin Murphy, D. Sculley, Balaji Lakshminarayanan, Jasper Snoek, Yarin Gal, Dustin Tran
In this paper we introduce Uncertainty Baselines: high-quality implementations of standard and state-of-the-art deep learning methods on a variety of tasks.
1 code implementation • 9 Mar 2021 • Jannik Kossen, Sebastian Farquhar, Yarin Gal, Tom Rainforth
While approaches like active learning reduce the number of labels needed for model training, existing literature largely ignores the cost of labeling test data, typically unrealistically assuming large test sets for model evaluation.
no code implementations • ICLR 2021 • Sebastian Farquhar, Yarin Gal, Tom Rainforth
Active learning is a powerful tool when labelling data is expensive, but it introduces a bias because the training data no longer follows the population distribution.
no code implementations • 1 Jul 2020 • Joost van Amersfoort, Milad Alizadeh, Sebastian Farquhar, Nicholas Lane, Yarin Gal
We introduce a method to speed up training by 2x and inference by 3x in deep neural networks using structured pruning applied before training.
no code implementations • NeurIPS 2020 • Sebastian Farquhar, Lewis Smith, Yarin Gal
We challenge the longstanding assumption that the mean-field approximation for variational inference in Bayesian neural networks is severely restrictive, and show this is not the case in deep networks.
1 code implementation • 22 Dec 2019 • Angelos Filos, Sebastian Farquhar, Aidan N. Gomez, Tim G. J. Rudner, Zachary Kenton, Lewis Smith, Milad Alizadeh, Arnoud de Kroon, Yarin Gal
From our comparison we conclude that some current techniques which solve benchmarks such as UCI `overfit' their uncertainty to the dataset---when evaluated on our benchmark these underperform in comparison to simpler baselines.
4 code implementations • 1 Jul 2019 • Sebastian Farquhar, Michael Osborne, Yarin Gal
The Radial BNN is motivated by avoiding a sampling problem in 'mean-field' variational inference (MFVI) caused by the so-called 'soap-bubble' pathology of multivariate Gaussians.
no code implementations • 18 Feb 2019 • Sebastian Farquhar, Yarin Gal
Catastrophic forgetting can be a significant problem for institutions that must delete historic data for privacy reasons.
2 code implementations • 18 Feb 2019 • Sebastian Farquhar, Yarin Gal
From a Bayesian perspective, continual learning seems straightforward: Given the model posterior one would simply use this as the prior for the next task.
no code implementations • 24 May 2018 • Sebastian Farquhar, Yarin Gal
Experiments used in current continual learning research do not faithfully assess fundamental challenges of learning continually.
no code implementations • 20 Feb 2018 • Miles Brundage, Shahar Avin, Jack Clark, Helen Toner, Peter Eckersley, Ben Garfinkel, Allan Dafoe, Paul Scharre, Thomas Zeitzoff, Bobby Filar, Hyrum Anderson, Heather Roff, Gregory C. Allen, Jacob Steinhardt, Carrick Flynn, Seán Ó hÉigeartaigh, SJ Beard, Haydn Belfield, Sebastian Farquhar, Clare Lyle, Rebecca Crootof, Owain Evans, Michael Page, Joanna Bryson, Roman Yampolskiy, Dario Amodei
This report surveys the landscape of potential security threats from malicious uses of AI, and proposes ways to better forecast, prevent, and mitigate these threats.