no code implementations • 3 Jun 2024 • Diego Dorn, Alexandre Variengien, Charbel-Raphaël Segerie, Vincent Corruble
Input-output safeguards are used to detect anomalies in the traces produced by Large Language Models (LLMs) systems.
1 code implementation • 13 Dec 2023 • Alexandre Variengien, Eric Winsor
We find that LMs internally decompose retrieval tasks in a modular way: middle layers at the last token position process the request, while late layers retrieve the correct entity from the context.
2 code implementations • NeurIPS 2023 • Michael Hanna, Ollie Liu, Alexandre Variengien
Concretely, we use mechanistic interpretability techniques to explain the (limited) mathematical abilities of GPT-2 small.
6 code implementations • 1 Nov 2022 • Kevin Wang, Alexandre Variengien, Arthur Conmy, Buck Shlegeris, Jacob Steinhardt
Research in mechanistic interpretability seeks to explain behaviors of machine learning models in terms of their internal components.
1 code implementation • 29 Jun 2021 • Alexandre Variengien, Stefano Nichele, Tom Glover, Sidney Pontes-Filho
The observations of the environment are transmitted in input cells, while the values of output cells are used as a readout of the system.
1 code implementation • 3 Dec 2020 • Alexandre Variengien, Xavier Hinaut
In this work, we trained ESNs and LSTMs on a Cross-Situationnal Learning (CSL) task.