Search Results for author: Sidak Pal Singh

Found 17 papers, 6 papers with code

Hallmarks of Optimization Trajectories in Neural Networks and LLMs: The Lengths, Bends, and Dead Ends

no code implementations12 Mar 2024 Sidak Pal Singh, Bobby He, Thomas Hofmann, Bernhard Schölkopf

We propose a fresh take on understanding the mechanisms of neural networks by analyzing the rich structure of parameters contained within their optimization trajectories.

Towards Meta-Pruning via Optimal Transport

1 code implementation12 Feb 2024 Alexander Theus, Olin Geimer, Friedrich Wicke, Thomas Hofmann, Sotiris Anagnostidis, Sidak Pal Singh

Structural pruning of neural networks conventionally relies on identifying and discarding less important neurons, a practice often resulting in significant accuracy loss that necessitates subsequent fine-tuning efforts.

Neural Network Compression

Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers

no code implementations17 Nov 2023 Vukasin Bozic, Danilo Dordevic, Daniele Coppola, Joseph Thommes, Sidak Pal Singh

This work presents an analysis of the effectiveness of using standard shallow feed-forward networks to mimic the behavior of the attention mechanism in the original Transformer model, a state-of-the-art architecture for sequence-to-sequence tasks.

Knowledge Distillation

Transformer Fusion with Optimal Transport

no code implementations9 Oct 2023 Moritz Imfeld, Jacopo Graldi, Marco Giordano, Thomas Hofmann, Sotiris Anagnostidis, Sidak Pal Singh

We flesh out an abstraction for layer alignment, that can generalize to arbitrary architectures -- in principle -- and we apply this to the key ingredients of Transformers such as multi-head self-attention, layer-normalization, and residual connections, and we discuss how to handle them via various ablation studies.

Image Classification Language Modelling

Towards guarantees for parameter isolation in continual learning

no code implementations2 Oct 2023 Giulia Lanzillotta, Sidak Pal Singh, Benjamin F. Grewe, Thomas Hofmann

Deep learning has proved to be a successful paradigm for solving many challenges in machine learning.

Continual Learning

On the curvature of the loss landscape

no code implementations10 Jul 2023 Alison Pouplin, Hrittik Roy, Sidak Pal Singh, Georgios Arvanitidis

In this work, we consider the loss landscape as an embedded Riemannian manifold and show that the differential geometric properties of the manifold can be used when analyzing the generalization abilities of a deep net.

The Hessian perspective into the Nature of Convolutional Neural Networks

no code implementations16 May 2023 Sidak Pal Singh, Thomas Hofmann, Bernhard Schölkopf

While Convolutional Neural Networks (CNNs) have long been investigated and applied, as well as theorized, we aim to provide a slightly different perspective into their nature -- through the perspective of their Hessian maps.

Some Intriguing Aspects about Lipschitz Continuity of Neural Networks

no code implementations21 Feb 2023 Grigory Khromov, Sidak Pal Singh

Lipschitz continuity is a crucial functional property of any predictive model, that naturally governs its robustness, generalisation, as well as adversarial vulnerability.

Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning

1 code implementation24 Aug 2022 Elias Frantar, Sidak Pal Singh, Dan Alistarh

We consider the problem of model compression for deep neural networks (DNNs) in the challenging one-shot/post-training setting, in which we are given an accurate trained model, and must compress it without any retraining, based only on a small amount of calibration input data.

Model Compression Quantization

Signal Propagation in Transformers: Theoretical Perspectives and the Role of Rank Collapse

no code implementations7 Jun 2022 Lorenzo Noci, Sotiris Anagnostidis, Luca Biggio, Antonio Orvieto, Sidak Pal Singh, Aurelien Lucchi

First, we show that rank collapse of the tokens' representations hinders training by causing the gradients of the queries and keys to vanish at initialization.

Phenomenology of Double Descent in Finite-Width Neural Networks

no code implementations ICLR 2022 Sidak Pal Singh, Aurelien Lucchi, Thomas Hofmann, Bernhard Schölkopf

`Double descent' delineates the generalization behaviour of models depending on the regime they belong to: under- or over-parameterized.

Analytic Insights into Structure and Rank of Neural Network Hessian Maps

no code implementations NeurIPS 2021 Sidak Pal Singh, Gregor Bachmann, Thomas Hofmann

Moreover, we demonstrate that our bounds remain faithful as an estimate of the numerical Hessian rank, for a larger class of models such as rectified and hyperbolic tangent networks.

WoodFisher: Efficient Second-Order Approximation for Neural Network Compression

1 code implementation NeurIPS 2020 Sidak Pal Singh, Dan Alistarh

Second-order information, in the form of Hessian- or Inverse-Hessian-vector products, is a fundamental tool for solving optimization problems.

Image Classification Neural Network Compression

Model Fusion via Optimal Transport

2 code implementations NeurIPS 2020 Sidak Pal Singh, Martin Jaggi

Finally, our approach also provides a principled way to combine the parameters of neural networks with different widths, and we explore its application for model compression.

Continual Learning Model Compression +2

GLOSS: Generative Latent Optimization of Sentence Representations

1 code implementation15 Jul 2019 Sidak Pal Singh, Angela Fan, Michael Auli

Both are trained to reconstruct the sentence based on a latent code and our model can be used to generate text.

Sentence Sentence Embedding

Context Mover's Distance & Barycenters: Optimal Transport of Contexts for Building Representations

2 code implementations29 Aug 2018 Sidak Pal Singh, Andreas Hug, Aymeric Dieuleveut, Martin Jaggi

We present a framework for building unsupervised representations of entities and their compositions, where each entity is viewed as a probability distribution rather than a vector embedding.

Sentence Sentence Embedding +1

Wasserstein is all you need

no code implementations5 Jun 2018 Sidak Pal Singh, Andreas Hug, Aymeric Dieuleveut, Martin Jaggi

We propose a unified framework for building unsupervised representations of individual objects or entities (and their compositions), by associating with each object both a distributional as well as a point estimate (vector embedding).

Sentence

Cannot find the paper you are looking for? You can Submit a new open access paper.