Search Results for author: Sotiris Anagnostidis

Found 13 papers, 4 papers with code

A Language Model's Guide Through Latent Space

no code implementations22 Feb 2024 Dimitri von Rütte, Sotiris Anagnostidis, Gregor Bachmann, Thomas Hofmann

Concept guidance has emerged as a cheap and simple way to control the behavior of language models by probing their hidden representations for concept vectors and using them to perturb activations at inference time.

Novel Concepts

Towards Meta-Pruning via Optimal Transport

1 code implementation12 Feb 2024 Alexander Theus, Olin Geimer, Friedrich Wicke, Thomas Hofmann, Sotiris Anagnostidis, Sidak Pal Singh

Structural pruning of neural networks conventionally relies on identifying and discarding less important neurons, a practice often resulting in significant accuracy loss that necessitates subsequent fine-tuning efforts.

Neural Network Compression

Harnessing Synthetic Datasets: The Role of Shape Bias in Deep Neural Network Generalization

no code implementations10 Nov 2023 Elior Benarous, Sotiris Anagnostidis, Luca Biggio, Thomas Hofmann

In this study, we investigate how neural networks exhibit shape bias during training on synthetic datasets, serving as an indicator of the synthetic data quality.

Navigating Scaling Laws: Compute Optimality in Adaptive Model Training

no code implementations6 Nov 2023 Sotiris Anagnostidis, Gregor Bachmann, Imanol Schlag, Thomas Hofmann

This leads to the notion of a `compute-optimal' model, i. e. a model that allocates a given level of compute during training optimally to maximize performance.

Transformer Fusion with Optimal Transport

1 code implementation9 Oct 2023 Moritz Imfeld, Jacopo Graldi, Marco Giordano, Thomas Hofmann, Sotiris Anagnostidis, Sidak Pal Singh

Fusion is a technique for merging multiple independently-trained neural networks in order to combine their capabilities.

Image Classification Language Modelling

Scaling MLPs: A Tale of Inductive Bias

1 code implementation NeurIPS 2023 Gregor Bachmann, Sotiris Anagnostidis, Thomas Hofmann

We show that the performance of MLPs drastically improves with scale (95% on CIFAR10, 82% on CIFAR100, 58% on ImageNet ReaL), highlighting that lack of inductive bias can indeed be compensated.

Computational Efficiency Inductive Bias +1

OpenAssistant Conversations -- Democratizing Large Language Model Alignment

no code implementations14 Apr 2023 Andreas Köpf, Yannic Kilcher, Dimitri von Rütte, Sotiris Anagnostidis, Zhi-Rui Tam, Keith Stevens, Abdullah Barhoum, Nguyen Minh Duc, Oliver Stanley, Richárd Nagyfi, Shahul ES, Sameer Suri, David Glushkov, Arnav Dantuluri, Andrew Maguire, Christoph Schuhmann, Huu Nguyen, Alexander Mattick

In an effort to democratize research on large-scale alignment, we release OpenAssistant Conversations, a human-generated, human-annotated assistant-style conversation corpus consisting of 161, 443 messages in 35 different languages, annotated with 461, 292 quality ratings, resulting in over 10, 000 complete and fully annotated conversation trees.

Language Modelling Large Language Model

Random Teachers are Good Teachers

1 code implementation23 Feb 2023 Felix Sarnthein, Gregor Bachmann, Sotiris Anagnostidis, Thomas Hofmann

In this work, we investigate the implicit regularization induced by teacher-student learning dynamics in self-distillation.

Data Augmentation Self-Supervised Learning

Cosmology from Galaxy Redshift Surveys with PointNet

no code implementations22 Nov 2022 Sotiris Anagnostidis, Arne Thomsen, Tomasz Kacprzak, Tilman Tröster, Luca Biggio, Alexandre Refregier, Thomas Hofmann

In this work, we aim to improve upon two-point statistics by employing a \textit{PointNet}-like neural network to regress the values of the cosmological parameters directly from point cloud data.

The Curious Case of Benign Memorization

no code implementations25 Oct 2022 Sotiris Anagnostidis, Gregor Bachmann, Lorenzo Noci, Thomas Hofmann

While such a memorization capacity seems worrisome, in this work we show that under training protocols that include \textit{data augmentation}, neural networks learn to memorize entirely random labels in a benign way, i. e. they learn embeddings that lead to highly non-trivial performance under nearest neighbour probing.

Data Augmentation Memorization

Mastering Spatial Graph Prediction of Road Networks

no code implementations ICCV 2023 Sotiris Anagnostidis, Aurelien Lucchi, Thomas Hofmann

Accurately predicting road networks from satellite images requires a global understanding of the network topology.

Reinforcement Learning (RL)

Signal Propagation in Transformers: Theoretical Perspectives and the Role of Rank Collapse

no code implementations7 Jun 2022 Lorenzo Noci, Sotiris Anagnostidis, Luca Biggio, Antonio Orvieto, Sidak Pal Singh, Aurelien Lucchi

First, we show that rank collapse of the tokens' representations hinders training by causing the gradients of the queries and keys to vanish at initialization.

Direct-Search for a Class of Stochastic Min-Max Problems

no code implementations22 Feb 2021 Sotiris Anagnostidis, Aurelien Lucchi, Youssef Diouane

Recent applications in machine learning have renewed the interest of the community in min-max optimization problems.

Cannot find the paper you are looking for? You can Submit a new open access paper.