Search Results for author: Sidak Pal Singh

Found 17 papers, 6 papers with code

Hallmarks of Optimization Trajectories in Neural Networks and LLMs: The Lengths, Bends, and Dead Ends

no code implementations • 12 Mar 2024 • Sidak Pal Singh, Bobby He, Thomas Hofmann, Bernhard Schölkopf

We propose a fresh take on understanding the mechanisms of neural networks by analyzing the rich structure of parameters contained within their optimization trajectories.

Paper
Add Code

Towards Meta-Pruning via Optimal Transport

1 code implementation • 12 Feb 2024 • Alexander Theus, Olin Geimer, Friedrich Wicke, Thomas Hofmann, Sotiris Anagnostidis, Sidak Pal Singh

Structural pruning of neural networks conventionally relies on identifying and discarding less important neurons, a practice often resulting in significant accuracy loss that necessitates subsequent fine-tuning efforts.

Neural Network Compression

Paper
Code

Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers

no code implementations • 17 Nov 2023 • Vukasin Bozic, Danilo Dordevic, Daniele Coppola, Joseph Thommes, Sidak Pal Singh

This work presents an analysis of the effectiveness of using standard shallow feed-forward networks to mimic the behavior of the attention mechanism in the original Transformer model, a state-of-the-art architecture for sequence-to-sequence tasks.

Knowledge Distillation

Paper
Add Code

Transformer Fusion with Optimal Transport

no code implementations • 9 Oct 2023 • Moritz Imfeld, Jacopo Graldi, Marco Giordano, Thomas Hofmann, Sotiris Anagnostidis, Sidak Pal Singh

We flesh out an abstraction for layer alignment, that can generalize to arbitrary architectures -- in principle -- and we apply this to the key ingredients of Transformers such as multi-head self-attention, layer-normalization, and residual connections, and we discuss how to handle them via various ablation studies.

Image Classification Language Modelling

Paper
Add Code

Towards guarantees for parameter isolation in continual learning

no code implementations • 2 Oct 2023 • Giulia Lanzillotta, Sidak Pal Singh, Benjamin F. Grewe, Thomas Hofmann

Deep learning has proved to be a successful paradigm for solving many challenges in machine learning.

Continual Learning

Paper
Add Code

On the curvature of the loss landscape

no code implementations • 10 Jul 2023 • Alison Pouplin, Hrittik Roy, Sidak Pal Singh, Georgios Arvanitidis

In this work, we consider the loss landscape as an embedded Riemannian manifold and show that the differential geometric properties of the manifold can be used when analyzing the generalization abilities of a deep net.

Paper
Add Code

The Hessian perspective into the Nature of Convolutional Neural Networks

no code implementations • 16 May 2023 • Sidak Pal Singh, Thomas Hofmann, Bernhard Schölkopf

While Convolutional Neural Networks (CNNs) have long been investigated and applied, as well as theorized, we aim to provide a slightly different perspective into their nature -- through the perspective of their Hessian maps.

Paper
Add Code

Some Intriguing Aspects about Lipschitz Continuity of Neural Networks

no code implementations • 21 Feb 2023 • Grigory Khromov, Sidak Pal Singh

Lipschitz continuity is a crucial functional property of any predictive model, that naturally governs its robustness, generalisation, as well as adversarial vulnerability.

Paper
Add Code

Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning

1 code implementation • 24 Aug 2022 • Elias Frantar, Sidak Pal Singh, Dan Alistarh

We consider the problem of model compression for deep neural networks (DNNs) in the challenging one-shot/post-training setting, in which we are given an accurate trained model, and must compress it without any retraining, based only on a small amount of calibration input data.

Model Compression Quantization

Paper
Code

Signal Propagation in Transformers: Theoretical Perspectives and the Role of Rank Collapse

no code implementations • 7 Jun 2022 • Lorenzo Noci, Sotiris Anagnostidis, Luca Biggio, Antonio Orvieto, Sidak Pal Singh, Aurelien Lucchi

First, we show that rank collapse of the tokens' representations hinders training by causing the gradients of the queries and keys to vanish at initialization.

Paper
Add Code

Phenomenology of Double Descent in Finite-Width Neural Networks

no code implementations • ICLR 2022 • Sidak Pal Singh, Aurelien Lucchi, Thomas Hofmann, Bernhard Schölkopf

`Double descent' delineates the generalization behaviour of models depending on the regime they belong to: under- or over-parameterized.

Paper
Add Code

Analytic Insights into Structure and Rank of Neural Network Hessian Maps

no code implementations • NeurIPS 2021 • Sidak Pal Singh, Gregor Bachmann, Thomas Hofmann

Moreover, we demonstrate that our bounds remain faithful as an estimate of the numerical Hessian rank, for a larger class of models such as rectified and hyperbolic tangent networks.

Paper
Add Code

WoodFisher: Efficient Second-Order Approximation for Neural Network Compression

1 code implementation • NeurIPS 2020 • Sidak Pal Singh, Dan Alistarh

Second-order information, in the form of Hessian- or Inverse-Hessian-vector products, is a fundamental tool for solving optimization problems.

Image Classification Neural Network Compression

Paper
Code

Model Fusion via Optimal Transport

2 code implementations • NeurIPS 2020 • Sidak Pal Singh, Martin Jaggi

Finally, our approach also provides a principled way to combine the parameters of neural networks with different widths, and we explore its application for model compression.

Continual Learning Model Compression +2

124

Paper
Code

GLOSS: Generative Latent Optimization of Sentence Representations

1 code implementation • 15 Jul 2019 • Sidak Pal Singh, Angela Fan, Michael Auli

Both are trained to reconstruct the sentence based on a latent code and our model can be used to generate text.

Sentence Sentence Embedding

Paper
Code

Context Mover's Distance & Barycenters: Optimal Transport of Contexts for Building Representations

2 code implementations • 29 Aug 2018 • Sidak Pal Singh, Andreas Hug, Aymeric Dieuleveut, Martin Jaggi

We present a framework for building unsupervised representations of entities and their compositions, where each entity is viewed as a probability distribution rather than a vector embedding.

Sentence Sentence Embedding +1

Paper
Code

Wasserstein is all you need

no code implementations • 5 Jun 2018 • Sidak Pal Singh, Andreas Hug, Aymeric Dieuleveut, Martin Jaggi

We propose a unified framework for building unsupervised representations of individual objects or entities (and their compositions), by associating with each object both a distributional as well as a point estimate (vector embedding).

Sentence

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.