no code implementations • 3 Mar 2025 • Sai Sumedh R. Hindupur, Ekdeep Singh Lubana, Thomas Fel, Demba Ba
Sparse Autoencoders (SAEs) are widely used to interpret neural networks by identifying meaningful concepts from their representations.
no code implementations • 18 Feb 2025 • Thomas Fel, Ekdeep Singh Lubana, Jacob S. Prince, Matthew Kowal, Victor Boutin, Isabel Papadimitriou, Binxu Wang, Martin Wattenberg, Demba Ba, Talia Konkle
Sparse Autoencoders (SAEs) have emerged as a powerful framework for machine learning interpretability, enabling the unsupervised decomposition of model representations into a dictionary of abstract, human-interpretable concepts.
no code implementations • 29 Dec 2024 • Core Francisco Park, Andrew Lee, Ekdeep Singh Lubana, Yongyi Yang, Maya Okawa, Kento Nishi, Martin Wattenberg, Hidenori Tanaka
Specifically, if we provide in-context exemplars wherein a concept plays a different role than what the pretraining data suggests, do models reorganize their representations in accordance with these novel semantics?
1 code implementation • 1 Dec 2024 • Core Francisco Park, Ekdeep Singh Lubana, Itamar Pres, Hidenori Tanaka
In-Context Learning (ICL) has significantly expanded the general-purpose nature of large language models, allowing them to adapt to novel tasks using merely the inputted context.
no code implementations • 29 Oct 2024 • Pulkit Gopalani, Ekdeep Singh Lubana, Wei Hu
We also analyze the training dynamics of individual model components to understand the sudden drop in loss.
no code implementations • 22 Oct 2024 • Itamar Pres, Laura Ruis, Ekdeep Singh Lubana, David Krueger
Representation engineering methods have recently shown promise for enabling efficient steering of model behavior.
no code implementations • 22 Oct 2024 • Kento Nishi, Maya Okawa, Rahul Ramesh, Mikail Khona, Hidenori Tanaka, Ekdeep Singh Lubana
We call this phenomenon representation shattering and demonstrate that it results in degradation of factual recall and reasoning performance more broadly.
no code implementations • 15 Oct 2024 • Abhinav Menon, Manish Shrivastava, David Krueger, Ekdeep Singh Lubana
Autoencoders have been used for finding interpretable and disentangled features underlying neural network representations in both image and text domains.
no code implementations • 10 Oct 2024 • Yongyi Yang, Core Francisco Park, Ekdeep Singh Lubana, Maya Okawa, Wei Hu, Hidenori Tanaka
We mathematically analyze the learning dynamics of neural networks trained on this SIM task and show that, despite its simplicity, SIM's learning dynamics capture and help explain key empirical observations on compositional generalization with diffusion models identified in prior work.
1 code implementation • 22 Aug 2024 • Ekdeep Singh Lubana, Kyogo Kawaguchi, Robert P. Dick, Hidenori Tanaka
We empirically investigate this definition by proposing an experimental system grounded in a context-sensitive formal language and find that Transformers trained to perform tasks on top of strings from this language indeed exhibit emergent capabilities.
no code implementations • 14 Jul 2024 • Samyak Jain, Ekdeep Singh Lubana, Kemal Oksuz, Tom Joy, Philip H. S. Torr, Amartya Sanyal, Puneet K. Dokania
Safety fine-tuning helps align Large Language Models (LLMs) with human preferences for their safe deployment.
1 code implementation • 27 Jun 2024 • Core Francisco Park, Maya Okawa, Andrew Lee, Hidenori Tanaka, Ekdeep Singh Lubana
Modern generative models demonstrate impressive capabilities, likely stemming from an ability to identify and manipulate abstract concepts underlying their training data.
1 code implementation • 15 Apr 2024 • Usman Anwar, Abulhair Saparov, Javier Rando, Daniel Paleka, Miles Turpin, Peter Hase, Ekdeep Singh Lubana, Erik Jenner, Stephen Casper, Oliver Sourbut, Benjamin L. Edelman, Zhaowei Zhang, Mario Günther, Anton Korinek, Jose Hernandez-Orallo, Lewis Hammond, Eric Bigelow, Alexander Pan, Lauro Langosco, Tomasz Korbak, Heidi Zhang, Ruiqi Zhong, Seán Ó hÉigeartaigh, Gabriel Recchia, Giulio Corsi, Alan Chan, Markus Anderljung, Lilian Edwards, Aleksandar Petrov, Christian Schroeder de Witt, Sumeet Ramesh Motwan, Yoshua Bengio, Danqi Chen, Philip H. S. Torr, Samuel Albanie, Tegan Maharaj, Jakob Foerster, Florian Tramer, He He, Atoosa Kasirzadeh, Yejin Choi, David Krueger
This work identifies 18 foundational challenges in assuring the alignment and safety of large language models (LLMs).
no code implementations • 12 Feb 2024 • Mikail Khona, Maya Okawa, Jan Hula, Rahul Ramesh, Kento Nishi, Robert Dick, Ekdeep Singh Lubana, Hidenori Tanaka
Stepwise inference protocols, such as scratchpads and chain-of-thought, help language models solve complex problems by decomposing them into a sequence of simpler subproblems.
no code implementations • 6 Dec 2023 • Ekdeep Singh Lubana, Johann Brehmer, Pim de Haan, Taco Cohen
We explore the viability of casting foundation models as generic reward functions for reinforcement learning.
no code implementations • 21 Nov 2023 • Samyak Jain, Robert Kirk, Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka, Edward Grefenstette, Tim Rocktäschel, David Scott Krueger
Fine-tuning large pre-trained models has become the de facto strategy for developing both task-specific and general-purpose machine learning systems, including developing models that are safe to deploy.
1 code implementation • 21 Nov 2023 • Rahul Ramesh, Ekdeep Singh Lubana, Mikail Khona, Robert P. Dick, Hidenori Tanaka
Transformers trained on huge text corpora exhibit a remarkable set of capabilities, e. g., performing basic arithmetic.
1 code implementation • 26 Oct 2023 • Eric J. Bigelow, Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka, Tomer D. Ullman
Large language models (LLMs) trained on huge corpora of text datasets demonstrate intriguing capabilities, achieving state-of-the-art performance on tasks they were not explicitly trained for.
1 code implementation • NeurIPS 2023 • Maya Okawa, Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka
Motivated by this, we perform a controlled study for understanding compositional generalization in conditional diffusion models in a synthetic setting, varying different attributes of the training data and measuring the model's ability to generate samples out-of-distribution.
1 code implementation • 15 Nov 2022 • Ekdeep Singh Lubana, Eric J. Bigelow, Robert P. Dick, David Krueger, Hidenori Tanaka
We study neural network loss landscapes through the lens of mode connectivity, the observation that minimizers of neural networks retrieved via training on a dataset are connected via simple paths of low loss.
no code implementations • 2 Oct 2022 • Liu Ziyin, Ekdeep Singh Lubana, Masahito Ueda, Hidenori Tanaka
Prevention of complete and dimensional collapse of representations has recently become a design principle for self-supervised learning (SSL).
1 code implementation • 4 Aug 2022 • Puja Trivedi, Ekdeep Singh Lubana, Mark Heimann, Danai Koutra, Jayaraman J. Thiagarajan
Overall, our work rigorously contextualizes, both empirically and theoretically, the effects of data-centric properties on augmentation strategies and learning paradigms for graph SSL.
1 code implementation • 23 May 2022 • Ekdeep Singh Lubana, Chi Ian Tang, Fahim Kawsar, Robert P. Dick, Akhil Mathur
Federated learning is generally used in tasks where labels are readily available (e. g., next word prediction).
no code implementations • 5 Nov 2021 • Puja Trivedi, Ekdeep Singh Lubana, Yujun Yan, Yaoqing Yang, Danai Koutra
Unsupervised graph representation learning is critical to a wide range of applications where labels may be scarce or expensive to procure.
1 code implementation • NeurIPS 2021 • Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka
Inspired by BatchNorm, there has been an explosion of normalization layers in deep learning.
2 code implementations • 4 Feb 2021 • Ekdeep Singh Lubana, Puja Trivedi, Danai Koutra, Robert P. Dick
Catastrophic forgetting undermines the effectiveness of deep neural networks (DNNs) in scenarios such as continual learning and lifelong learning.
1 code implementation • ICLR 2021 • Ekdeep Singh Lubana, Robert P. Dick
We use this framework to determine the relationship between pruning measures and evolution of model parameters, establishing several results related to pruning models early-on in training: (i) magnitude-based pruning removes parameters that contribute least to reduction in loss, resulting in models that converge faster than magnitude-agnostic methods; (ii) loss-preservation based pruning preserves first-order model evolution dynamics and is therefore appropriate for pruning minimally trained models; and (iii) gradient-norm based pruning affects second-order model evolution dynamics, such that increasing gradient norm via pruning can produce poorly performing models.
1 code implementation • 10 Sep 2020 • Ekdeep Singh Lubana, Puja Trivedi, Conrad Hougen, Robert P. Dick, Alfred O. Hero
To address this issue, we propose OrthoReg, a principled regularization strategy that enforces orthonormality on a network's filters to reduce inter-filter correlation, thereby allowing reliable, efficient determination of group importance estimates, improved trainability of pruned networks, and efficient, simultaneous pruning of large groups of filters.