Search Results for author: Aidan Ewart

Found 2 papers, 1 papers with code

Eight Methods to Evaluate Robust Unlearning in LLMs

no code implementations • 26 Feb 2024 • Aengus Lynch, Phillip Guo, Aidan Ewart, Stephen Casper, Dylan Hadfield-Menell

Machine unlearning can be useful for removing harmful capabilities and memorized text from large language models (LLMs), but there are not yet standardized methods for rigorously evaluating it.

Machine Unlearning

Paper
Add Code

Sparse Autoencoders Find Highly Interpretable Features in Language Models

2 code implementations • 15 Sep 2023 • Hoagy Cunningham, Aidan Ewart, Logan Riggs, Robert Huben, Lee Sharkey

One hypothesised cause of polysemanticity is \textit{superposition}, where neural networks represent more features than they have neurons by assigning features to an overcomplete set of directions in activation space, rather than to individual neurons.

counterfactual Language Modelling +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.