Search Results for author: Fazl Barez

Found 21 papers, 10 papers with code

Visualizing Neural Network Imagination

no code implementations10 May 2024 Nevan Wichers, Victor Tao, Riccardo Volpato, Fazl Barez

Our goal is to visualize what environment states the networks are representing.


Interpreting Context Look-ups in Transformers: Investigating Attention-MLP Interactions

no code implementations23 Feb 2024 Clement Neo, Shay B. Cohen, Fazl Barez

In this paper, we investigate the interplay between attention heads and specialized "next-token" neurons in the Multilayer Perceptron that predict specific tokens.

Increasing Trust in Language Models through the Reuse of Verified Circuits

2 code implementations4 Feb 2024 Philip Quirke, Clement Neo, Fazl Barez

To exhibit the reusability of verified modules, we insert the trained integer addition model into an untrained model and train the combined model to perform both addition and subtraction.

Large Language Models Relearn Removed Concepts

1 code implementation3 Jan 2024 Michelle Lo, Shay B. Cohen, Fazl Barez

This demonstrates that models exhibit polysemantic capacities and can blend old and new concepts in individual neurons.

Model Editing

Measuring Value Alignment

no code implementations23 Dec 2023 Fazl Barez, Philip Torr

As artificial intelligence (AI) systems become increasingly integrated into various domains, ensuring that they align with human values becomes critical.

Autonomous Vehicles Recommendation Systems

Interpreting Shared Circuits for Ordered Sequence Prediction in a Large Language Model

no code implementations7 Nov 2023 Michael Lan, Fazl Barez

While transformer models exhibit strong capabilities on linguistic tasks, their complex architectures make them difficult to interpret.

Language Modelling Large Language Model

Understanding Addition in Transformers

4 code implementations19 Oct 2023 Philip Quirke, Fazl Barez

Understanding the inner workings of machine learning models like Transformers is vital for their safe and ethical use.

Beyond Training Objectives: Interpreting Reward Model Divergence in Large Language Models

no code implementations12 Oct 2023 Luke Marks, Amir Abdullah, Clement Neo, Rauno Arike, Philip Torr, Fazl Barez

Large language models (LLMs) fine-tuned by reinforcement learning from human feedback (RLHF) are becoming more widely deployed.

AI Systems of Concern

no code implementations9 Oct 2023 Kayla Matteucci, Shahar Avin, Fazl Barez, Seán Ó hÉigeartaigh

Concerns around future dangers from advanced AI often centre on systems hypothesised to have intrinsic characteristics such as agent-like behaviour, strategic awareness, and long-range planning.

DeepDecipher: Accessing and Investigating Neuron Activation in Large Language Models

1 code implementation3 Oct 2023 Albert Garde, Esben Kran, Fazl Barez

By granting access to state-of-the-art interpretability methods, DeepDecipher makes LLMs more transparent, trustworthy, and safe.

Neuron to Graph: Interpreting Language Model Neurons at Scale

1 code implementation31 May 2023 Alex Foote, Neel Nanda, Esben Kran, Ioannis Konstas, Shay Cohen, Fazl Barez

Conventional methods require examination of examples with strong neuron activation and manual identification of patterns to decipher the concepts a neuron responds to.

Language Modelling

Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark

1 code implementation27 May 2023 Jason Hoelscher-Obermaier, Julia Persson, Esben Kran, Ioannis Konstas, Fazl Barez

We use this improved benchmark to evaluate recent model editing techniques and find that they suffer from low specificity.

Model Editing Specificity

The Larger They Are, the Harder They Fail: Language Models do not Recognize Identifier Swaps in Python

1 code implementation24 May 2023 Antonio Valerio Miceli-Barone, Fazl Barez, Ioannis Konstas, Shay B. Cohen

Large Language Models (LLMs) have successfully been applied to code generation tasks, raising the question of how well these models understand programming.

Code Generation

System III: Learning with Domain Knowledge for Safety Constraints

no code implementations23 Apr 2023 Fazl Barez, Hosien Hasanbieg, Alesandro Abbate

We evaluate the satisfaction of these constraints via p-norms in state vector space.

Safe Exploration

N2G: A Scalable Approach for Quantifying Interpretable Neuron Representations in Large Language Models

no code implementations22 Apr 2023 Alex Foote, Neel Nanda, Esben Kran, Ionnis Konstas, Fazl Barez

Understanding the function of individual neurons within language models is essential for mechanistic interpretability research.

Fairness in AI and Its Long-Term Implications on Society

no code implementations16 Apr 2023 Ondrej Bohdal, Timothy Hospedales, Philip H. S. Torr, Fazl Barez

Successful deployment of artificial intelligence (AI) in various settings has led to numerous positive outcomes for individuals and society.

Decision Making Fairness

Exploring the Advantages of Transformers for High-Frequency Trading

1 code implementation20 Feb 2023 Fazl Barez, Paul Bilokon, Arthur Gervais, Nikita Lisitsyn

This paper explores the novel deep learning Transformers architectures for high-frequency Bitcoin-USDT log-return forecasting and compares them to the traditional Long Short-Term Memory models.

Decoder Position +3

PMIC: Improving Multi-Agent Reinforcement Learning with Progressive Mutual Information Collaboration

1 code implementation16 Mar 2022 Pengyi Li, Hongyao Tang, Tianpei Yang, Xiaotian Hao, Tong Sang, Yan Zheng, Jianye Hao, Matthew E. Taylor, Wenyuan Tao, Zhen Wang, Fazl Barez

However, we reveal sub-optimal collaborative behaviors also emerge with strong correlations, and simply maximizing the MI can, surprisingly, hinder the learning towards better collaboration.

Multi-agent Reinforcement Learning reinforcement-learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.