no code implementations • 17 Apr 2025 • Charles O'Neill, Tirthankar Ghosal, Roberta Răileanu, Mike Walmsley, Thang Bui, Kevin Schawinski, Ioana Ciucă
We demonstrate that framing hypothesis generation as conditional language modelling, with the model fine-tuned on Bit-Flip-Spark and the Chain-of-Reasoning (and where, at inference, we only provide the Bit), leads to improvements in the overall quality of the hypotheses.
no code implementations • 3 Mar 2025 • David Klindt, Charles O'Neill, Patrik Reizinger, Harald Maurer, Nina Miolane
By bridging insights from theoretical neuroscience, representation learning, and interpretability research, we propose an emerging perspective on understanding neural representations in both artificial and biological systems.
no code implementations • 6 Jan 2025 • Charles O'Neill
Our results build on and extend recent work on category-theoretic foundations for deep learning, offering deeper insights into the algebraic structure of attention mechanisms.
no code implementations • 20 Nov 2024 • Charles O'Neill, Alim Gumran, David Klindt
We demonstrate this generalises to SAEs applied to large language models, where more expressive encoders achieve greater interpretability.
no code implementations • 2 Aug 2024 • Kartheik G. Iyer, Mikaeel Yunus, Charles O'Neill, Christine Ye, Alina Hyk, Kiera McCormick, Ioana Ciuca, John F. Wu, Alberto Accomazzi, Simone Astarita, Rishabh Chakrabarty, Jesse Cranney, Anjalie Field, Tirthankar Ghosal, Michele Ginolfi, Marc Huertas-Company, Maja Jablonska, Sandor Kruk, Huiling Liu, Gabriel Marchidan, Rohit Mistry, J. P. Naiman, J. E. G. Peek, Mugdha Polimera, Sergio J. Rodriguez, Kevin Schawinski, Sanjib Sharma, Michael J. Smith, Yuan-Sen Ting, Mike Walmsley
The exponential growth of astronomical literature poses significant challenges for researchers navigating and synthesizing general insights or even domain-specific knowledge.
no code implementations • 1 Aug 2024 • Charles O'Neill, Christine Ye, Kartheik Iyer, John F. Wu
Sparse autoencoders (SAEs) have shown promise in extracting interpretable features from complex neural networks.
1 code implementation • 30 May 2024 • John F. Wu, Alina Hyk, Kiera McCormick, Christine Ye, Simone Astarita, Elina Baral, Jo Ciuca, Jesse Cranney, Anjalie Field, Kartheik Iyer, Philipp Koehn, Jenn Kotler, Sandor Kruk, Michelle Ntampaka, Charles O'Neill, Joshua E. G. Peek, Sanjib Sharma, Mikaeel Yunus
It is imperative to understand how researchers interact with these models and how scientific sub-communities like astronomy might benefit from them.
no code implementations • 21 May 2024 • Charles O'Neill, Thang Bui
We propose training sparse autoencoders on carefully designed positive and negative examples, where the model can only correctly predict the next token for the positive examples.
1 code implementation • 14 Feb 2024 • Jack Miller, Patrick Gleeson, Charles O'Neill, Thang Bui, Noam Levi
Neural networks sometimes exhibit grokking, a phenomenon where perfect or near-perfect performance is achieved on a validation set well after the same performance has been obtained on the corresponding training set.
1 code implementation • 26 Oct 2023 • Jack Miller, Charles O'Neill, Thang Bui
In some settings neural networks exhibit a phenomenon known as \textit{grokking}, where they achieve perfect or near-perfect accuracy on the validation set long after the same performance has been achieved on the training set.
no code implementations • 26 Aug 2023 • Charles O'Neill, Jack Miller, Ioana Ciuca, Yuan-Sen Ting, Thang Bui
The performance of our approach is evaluated through classification accuracy on a dataset consisting of problematic prompts not detected by GPT-4, as well as a selection of contentious but unproblematic prompts.
no code implementations • 15 Aug 2023 • Charles O'Neill, Yuan-Sen Ting, Ioana Ciuca, Jack Miller, Thang Bui
Large Language Models (LLMs) hold immense potential to generate synthetic data of high quality and utility, which has numerous applications from downstream model training to practical data utilisation.
no code implementations • 15 Mar 2023 • Charles O'Neill
Rice is a staple food in the world's diet, and yet huge percentages of crop yields are lost each year to disease.
no code implementations • 23 Dec 2022 • Jack W. Miller, Charles O'Neill, Navid C. Constantinou, Omri Azencot
In addition, we suggest the "eigenloss" penalty scheme that penalises the eigenvalues of the Koopman operator during training.