Search Results for author: Sidak Pal Singh

Found 22 papers, 8 papers with code

Avoiding spurious sharpness minimization broadens applicability of SAM

no code implementations4 Feb 2025 Sidak Pal Singh, Hossein Mobahi, Atish Agarwala, Yann Dauphin

We investigate the discrepancy across domains and find that in the NLP setting, SAM is dominated by regularization of the logit statistics -- instead of improving the geometry of the function itself.

Theoretical characterisation of the Gauss-Newton conditioning in Neural Networks

no code implementations4 Nov 2024 Jim Zhao, Sidak Pal Singh, Aurelien Lucchi

Finally, we empirically validate the bounds and uncover valuable insights into the influence of the analyzed architectural components.

What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis

no code implementations14 Oct 2024 Weronika Ormaniec, Felix Dangel, Sidak Pal Singh

In this work, we bridge this gap by providing a fundamental understanding of what distinguishes the Transformer from the other architectures -- grounded in a theoretical comparison of the (loss) Hessian.

Local vs Global continual learning

no code implementations23 Jul 2024 Giulia Lanzillotta, Sidak Pal Singh, Benjamin F. Grewe, Thomas Hofmann

We classify existing continual learning algorithms based on the approximation used, and we assess the practical effects of this distinction in common continual learning settings. Additionally, we study optimal continual learning objectives in the case of local polynomial approximations and we provide examples of existing algorithms implementing the optimal objectives

Continual Learning

Landscaping Linear Mode Connectivity

no code implementations24 Jun 2024 Sidak Pal Singh, Linara Adilova, Michael Kamp, Asja Fischer, Bernhard Schölkopf, Thomas Hofmann

In this work, we take a step towards understanding it by providing a model of how the loss landscape needs to behave topographically for LMC (or the lack thereof) to manifest.

Linear Mode Connectivity

Hallmarks of Optimization Trajectories in Neural Networks: Directional Exploration and Redundancy

no code implementations12 Mar 2024 Sidak Pal Singh, Bobby He, Thomas Hofmann, Bernhard Schölkopf

We propose a fresh take on understanding the mechanisms of neural networks by analyzing the rich directional structure of optimization trajectories, represented by their pointwise parameters.

Towards Meta-Pruning via Optimal Transport

1 code implementation12 Feb 2024 Alexander Theus, Olin Geimer, Friedrich Wicke, Thomas Hofmann, Sotiris Anagnostidis, Sidak Pal Singh

Structural pruning of neural networks conventionally relies on identifying and discarding less important neurons, a practice often resulting in significant accuracy loss that necessitates subsequent fine-tuning efforts.

Neural Network Compression

Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers

no code implementations17 Nov 2023 Vukasin Bozic, Danilo Dordevic, Daniele Coppola, Joseph Thommes, Sidak Pal Singh

This work presents an analysis of the effectiveness of using standard shallow feed-forward networks to mimic the behavior of the attention mechanism in the original Transformer model, a state-of-the-art architecture for sequence-to-sequence tasks.

Knowledge Distillation

Transformer Fusion with Optimal Transport

1 code implementation9 Oct 2023 Moritz Imfeld, Jacopo Graldi, Marco Giordano, Thomas Hofmann, Sotiris Anagnostidis, Sidak Pal Singh

Fusion is a technique for merging multiple independently-trained neural networks in order to combine their capabilities.

Image Classification Language Modeling +1

Towards guarantees for parameter isolation in continual learning

no code implementations2 Oct 2023 Giulia Lanzillotta, Sidak Pal Singh, Benjamin F. Grewe, Thomas Hofmann

Deep learning has proved to be a successful paradigm for solving many challenges in machine learning.

Continual Learning

On the curvature of the loss landscape

no code implementations10 Jul 2023 Alison Pouplin, Hrittik Roy, Sidak Pal Singh, Georgios Arvanitidis

In this work, we consider the loss landscape as an embedded Riemannian manifold and show that the differential geometric properties of the manifold can be used when analyzing the generalization abilities of a deep net.

The Hessian perspective into the Nature of Convolutional Neural Networks

no code implementations16 May 2023 Sidak Pal Singh, Thomas Hofmann, Bernhard Schölkopf

While Convolutional Neural Networks (CNNs) have long been investigated and applied, as well as theorized, we aim to provide a slightly different perspective into their nature -- through the perspective of their Hessian maps.

Some Fundamental Aspects about Lipschitz Continuity of Neural Networks

1 code implementation21 Feb 2023 Grigory Khromov, Sidak Pal Singh

Lipschitz continuity is a crucial functional property of any predictive model, that naturally governs its robustness, generalisation, as well as adversarial vulnerability.

Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning

1 code implementation24 Aug 2022 Elias Frantar, Sidak Pal Singh, Dan Alistarh

We consider the problem of model compression for deep neural networks (DNNs) in the challenging one-shot/post-training setting, in which we are given an accurate trained model, and must compress it without any retraining, based only on a small amount of calibration input data.

Model Compression Quantization

Signal Propagation in Transformers: Theoretical Perspectives and the Role of Rank Collapse

no code implementations7 Jun 2022 Lorenzo Noci, Sotiris Anagnostidis, Luca Biggio, Antonio Orvieto, Sidak Pal Singh, Aurelien Lucchi

First, we show that rank collapse of the tokens' representations hinders training by causing the gradients of the queries and keys to vanish at initialization.

Phenomenology of Double Descent in Finite-Width Neural Networks

no code implementations ICLR 2022 Sidak Pal Singh, Aurelien Lucchi, Thomas Hofmann, Bernhard Schölkopf

`Double descent' delineates the generalization behaviour of models depending on the regime they belong to: under- or over-parameterized.

Analytic Insights into Structure and Rank of Neural Network Hessian Maps

no code implementations NeurIPS 2021 Sidak Pal Singh, Gregor Bachmann, Thomas Hofmann

Moreover, we demonstrate that our bounds remain faithful as an estimate of the numerical Hessian rank, for a larger class of models such as rectified and hyperbolic tangent networks.

WoodFisher: Efficient Second-Order Approximation for Neural Network Compression

1 code implementation NeurIPS 2020 Sidak Pal Singh, Dan Alistarh

Second-order information, in the form of Hessian- or Inverse-Hessian-vector products, is a fundamental tool for solving optimization problems.

Image Classification Neural Network Compression

Model Fusion via Optimal Transport

2 code implementations NeurIPS 2020 Sidak Pal Singh, Martin Jaggi

Finally, our approach also provides a principled way to combine the parameters of neural networks with different widths, and we explore its application for model compression.

Continual Learning model +3

GLOSS: Generative Latent Optimization of Sentence Representations

1 code implementation15 Jul 2019 Sidak Pal Singh, Angela Fan, Michael Auli

Both are trained to reconstruct the sentence based on a latent code and our model can be used to generate text.

Sentence Sentence Embedding

Context Mover's Distance & Barycenters: Optimal Transport of Contexts for Building Representations

2 code implementations29 Aug 2018 Sidak Pal Singh, Andreas Hug, Aymeric Dieuleveut, Martin Jaggi

We present a framework for building unsupervised representations of entities and their compositions, where each entity is viewed as a probability distribution rather than a vector embedding.

Sentence Sentence Embedding +1

Wasserstein is all you need

no code implementations5 Jun 2018 Sidak Pal Singh, Andreas Hug, Aymeric Dieuleveut, Martin Jaggi

We propose a unified framework for building unsupervised representations of individual objects or entities (and their compositions), by associating with each object both a distributional as well as a point estimate (vector embedding).

Sentence

Cannot find the paper you are looking for? You can Submit a new open access paper.