Search Results for author: Matteo Pagliardini

Found 14 papers, 9 papers with code

The AdEMAMix Optimizer: Better, Faster, Older

4 code implementations5 Sep 2024 Matteo Pagliardini, Pierre Ablin, David Grangier

This work questions the use of a single EMA to accumulate past gradients and empirically demonstrates how this choice can be sub-optimal: a single EMA cannot simultaneously give a high weight to the immediate past, and a non-negligible weight to older gradients.

Image Classification Language Modelling

DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging

no code implementations4 Feb 2024 Matteo Pagliardini, Amirkeivan Mohtashami, Francois Fleuret, Martin Jaggi

The transformer architecture by Vaswani et al. (2017) is now ubiquitous across application domains, from natural language processing to speech processing and image understanding.

DoGE: Domain Reweighting with Generalization Estimation

no code implementations23 Oct 2023 Simin Fan, Matteo Pagliardini, Martin Jaggi

Moreover, aiming to generalize to out-of-domain target tasks, which is unseen in the pretraining corpus (OOD domain), DoGE can effectively identify inter-domain dependencies, and consistently achieves better test perplexity on the target domain.

Domain Generalization Language Modelling

Faster Causal Attention Over Large Sequences Through Sparse Flash Attention

2 code implementations1 Jun 2023 Matteo Pagliardini, Daniele Paliotta, Martin Jaggi, François Fleuret

While many works have proposed schemes to sparsify the attention patterns and reduce the computational overhead of self-attention, those are often limited by implementations concerns and end up imposing a simple and static structure over the attention matrix.

16k 8k +1

A Primal-Dual Approach to Solving Variational Inequalities with General Constraints

2 code implementations27 Oct 2022 Tatjana Chavdarova, Tong Yang, Matteo Pagliardini, Michael I. Jordan

We prove the convergence of this method and show that the gap function of the last iterate of the method decreases at a rate of $O(\frac{1}{\sqrt{K}})$ when the operator is $L$-Lipschitz and monotone.

Improving Generalization via Uncertainty Driven Perturbations

no code implementations11 Feb 2022 Matteo Pagliardini, Gilberto Manunza, Martin Jaggi, Michael I. Jordan, Tatjana Chavdarova

We show that UDP is guaranteed to achieve the maximum margin decision boundary on linear models and that it notably increases it on challenging simulated datasets.

Agree to Disagree: Diversity through Disagreement for Better Transferability

1 code implementation9 Feb 2022 Matteo Pagliardini, Martin Jaggi, François Fleuret, Sai Praneeth Karimireddy

This behavior can hinder the transferability of trained models by (i) favoring the learning of simpler but spurious features -- present in the training data but absent from the test data -- and (ii) by only leveraging a small subset of predictive features.

Diversity Out of Distribution (OOD) Detection

The Peril of Popular Deep Learning Uncertainty Estimation Methods

1 code implementation9 Dec 2021 Yehao Liu, Matteo Pagliardini, Tatjana Chavdarova, Sebastian U. Stich

Secondly, we show on a 2D toy example that both BNNs and MCDropout do not give high uncertainty estimates on OOD samples.

Deep Learning

Improved Generalization-Robustness Trade-off via Uncertainty Targeted Attacks

no code implementations29 Sep 2021 Matteo Pagliardini, Gilberto Manunza, Martin Jaggi, Tatjana Chavdarova

The deep learning models' sensitivity to small input perturbations raises security concerns and limits their use for applications where reliability is critical.

Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features

5 code implementations NAACL 2018 Matteo Pagliardini, Prakhar Gupta, Martin Jaggi

The recent tremendous success of unsupervised word embeddings in a multitude of applications raises the obvious question if similar methods could be derived to improve embeddings (i. e. semantic representations) of word sequences as well.

Sentence Sentence Embeddings +1

Cannot find the paper you are looking for? You can Submit a new open access paper.