Search Results for author: Nandi Schoots

Found 9 papers, 4 papers with code

Training Neural Networks for Modularity aids Interpretability

no code implementations24 Sep 2024 Satvik Golechha, Dylan Cope, Nandi Schoots

An approach to improve network interpretability is via clusterability, i. e., splitting a model into disjoint clusters that can be studied independently.

Extending Activation Steering to Broad Skills and Multiple Behaviours

1 code implementation9 Mar 2024 Teun van der Weij, Massimo Poesio, Nandi Schoots

In this paper, we investigate the efficacy of activation steering for broad skills and multiple behaviours.

Dissecting Language Models: Machine Unlearning via Selective Pruning

1 code implementation2 Mar 2024 Nicholas Pochinkov, Nandi Schoots

This approach is a compute- and data-efficient method for identifying and removing neurons that enable specific behaviours.

Machine Unlearning

Improving Activation Steering in Language Models with Mean-Centring

no code implementations6 Dec 2023 Ole Jorgensen, Dylan Cope, Nandi Schoots, Murray Shanahan

Recent work in activation steering has demonstrated the potential to better control the outputs of Large Language Models (LLMs), but it involves finding steering vectors.

Comparing Optimization Targets for Contrast-Consistent Search

1 code implementation1 Nov 2023 Hugo Fry, Seamus Fallows, Ian Fan, Jamie Wright, Nandi Schoots

We investigate the optimization target of Contrast-Consistent Search (CCS), which aims to recover the internal representations of truth of a large language model.

Language Modelling Large Language Model

Any Deep ReLU Network is Shallow

no code implementations20 Jun 2023 Mattia Jacopo Villani, Nandi Schoots

We constructively prove that every deep ReLU network can be rewritten as a functionally identical three-layer network with weights valued in the extended reals.

Low-Entropy Latent Variables Hurt Out-of-Distribution Performance

no code implementations20 May 2023 Nandi Schoots, Dylan Cope

We study the relationship between the entropy of intermediate representations and a model's robustness to distributional shift.

Contrastive Learning

A theory of representation learning gives a deep generalisation of kernel methods

no code implementations30 Aug 2021 Adam X. Yang, Maxime Robeyns, Edward Milsom, Ben Anson, Nandi Schoots, Laurence Aitchison

In particular, we show that Deep Gaussian processes (DGPs) in the Bayesian representation learning limit have exactly multivariate Gaussian posteriors, and the posterior covariances can be obtained by optimizing an interpretable objective combining a log-likelihood to improve performance with a series of KL-divergences which keep the posteriors close to the prior.

Bayesian Inference Gaussian Processes +1

Learning to Communicate with Strangers via Channel Randomisation Methods

1 code implementation19 Apr 2021 Dylan Cope, Nandi Schoots

We introduce two methods for improving the performance of agents meeting for the first time to accomplish a communicative task.

Cannot find the paper you are looking for? You can Submit a new open access paper.