no code implementations • 29 Oct 2024 • Nikolaos Tsilivis, Gal Vardi, Julia Kempe
We study the implicit bias of the general family of steepest descent algorithms, which includes gradient descent, sign descent and coordinate descent, in deep homogeneous neural networks.
no code implementations • 21 Oct 2024 • Matteo Vilucchio, Nikolaos Tsilivis, Bruno Loureiro, Julia Kempe
Indeed, controlling the complexity of the model class is particularly important when data is scarce, noisy or contaminated, as it translates a statistical belief on the underlying structure of the data.
no code implementations • 9 Oct 2024 • François Charton, Julia Kempe
We study the performance of transformers as a function of the number of repetitions of training examples with algorithmically generated datasets.
no code implementations • 7 Oct 2024 • Elvis Dohmatob, Yunzhen Feng, Arjun Subramonian, Julia Kempe
Within the scaling laws paradigm, which underpins the training of large neural networks like ChatGPT and Llama, we consider a supervised regression setting and establish the existance of a strong form of the model collapse phenomenon, a critical performance degradation due to synthetic data in the training corpus.
no code implementations • 11 Jun 2024 • Yunzhen Feng, Elvis Dohmatob, Pu Yang, Francois Charton, Julia Kempe
Large Language Models (LLM) are increasingly trained on data generated by other LLM, either because generated text and images become part of the pre-training corpus, or because synthetized data is used as a replacement for expensive human-annotation.
no code implementations • 7 Jun 2024 • Nikolaos Tsilivis, Natalie Frank, Nathan Srebro, Julia Kempe
We study the implicit bias of optimization in robust empirical risk minimization (robust ERM) and its connection with robust generalization.
1 code implementation • 4 Jun 2024 • Vivien Cabannes, Charles Arnal, Wassim Bouaziz, Alice Yang, Francois Charton, Julia Kempe
Chain-of-Thought (CoT) reasoning is known to improve Large Language Models both empirically and in terms of theoretical approximation power.
no code implementations • 27 Apr 2024 • Yunzhen Feng, Tim G. J. Rudner, Nikolaos Tsilivis, Julia Kempe
Adversarial examples have been shown to cause neural networks to fail on a wide range of vision and language tasks, but recent work has claimed that Bayesian neural networks (BNNs) are inherently robust to adversarial perturbations.
1 code implementation • 8 Apr 2024 • Artem Vysogorets, Kartik Ahuja, Julia Kempe
In the era of exceptionally data-hungry models, careful selection of the training data is essential to mitigate the extensive costs of deep learning.
1 code implementation • 14 Mar 2024 • Tim G. J. Rudner, Ya Shi Zhang, Andrew Gordon Wilson, Julia Kempe
Machine learning models often perform poorly under subpopulation shifts in the data distribution.
no code implementations • 12 Feb 2024 • Elvis Dohmatob, Yunzhen Feng, Julia Kempe
In the era of proliferation of large language and image generation models, the phenomenon of "model collapse" refers to the situation whereby as a model is trained recursively on data generated from previous generations of itself over time, its performance degrades until the model eventually becomes completely useless, i. e the model collapses.
no code implementations • 10 Feb 2024 • Elvis Dohmatob, Yunzhen Feng, Pu Yang, Francois Charton, Julia Kempe
We discover a wide range of decay phenomena, analyzing loss of scaling, shifted scaling with number of generations, the ''un-learning" of skills, and grokking when mixing human and synthesized data.
1 code implementation • 5 Feb 2024 • Artem Vysogorets, Anna Dawid, Julia Kempe
The second-order properties of the training loss have a massive impact on the optimization dynamics of deep learning models.
1 code implementation • 29 Nov 2023 • Haowen Guan, Xuan Zhao, Zishi Wang, Zhiyang Li, Julia Kempe
In many applications, Neural Nets (NNs) have classification performance on par or even exceeding human capacity.
1 code implementation • 13 Nov 2023 • Jingtong Su, Ya Shi Zhang, Nikolaos Tsilivis, Julia Kempe
We further analyze the geometry of networks that are optimized to be robust against adversarial perturbations of the input, and find that Neural Collapse is a pervasive phenomenon in these cases as well, with clean and perturbed representations forming aligned simplices, and giving rise to a robust simple nearest-neighbor classifier.
no code implementations • 5 Jul 2023 • Francesco Cagnetta, Deborah Oliveira, Mahalakshmi Sabanayagam, Nikolaos Tsilivis, Julia Kempe
Lecture notes from the course given by Professor Julia Kempe at the summer school "Statistical physics of Machine Learning" in Les Houches.
no code implementations • 19 Apr 2023 • Jingtong Su, Julia Kempe
2) Replacing the front-end VOneBlock by an off-the-shelf parameter-free Scatternet followed by simple uniform Gaussian noise can achieve much more substantial adversarial robustness without adversarial training.
1 code implementation • 11 Oct 2022 • Nikolaos Tsilivis, Julia Kempe
The adversarial vulnerability of neural nets, and subsequent techniques to create robust models have attracted significant attention; yet we still lack a full understanding of this phenomenon.
no code implementations • 5 Oct 2022 • Dhrupad Bhardwaj, Julia Kempe, Artem Vysogorets, Angela M. Teng, Evaristus C. Ezekwem
Starting from existing work on network masking (Wortsman et al., 2020), we show that simply learning a linear combination of a small number of task-specific supermasks (impressions) on a randomly initialized backbone network is sufficient to both retain accuracy on previously learned tasks, as well as achieve high accuracy on unseen tasks.
1 code implementation • 24 Jul 2022 • Nikolaos Tsilivis, Jingtong Su, Julia Kempe
In parallel, we revisit prior work that also focused on the problem of data optimization for robust classification \citep{Ily+19}, and show that being robust to adversarial attacks after standard (gradient descent) training on a suitable dataset is more challenging than previously thought.
no code implementations • 29 Sep 2021 • Nikolaos Tsilivis, Julia Kempe
In particular, in the regime where the Neural Tangent Kernel theory holds, we derive a simple, but powerful strategy for attacking models, which in contrast to prior work, does not require any access to the model under attack, or any trained replica of it for that matter.
1 code implementation • 5 Jul 2021 • Artem Vysogorets, Julia Kempe
Neural network pruning is a fruitful area of research with surging interest in high sparsity regimes.
1 code implementation • 13 Mar 2003 • Julia Kempe
This article aims to provide an introductory survey on quantum random walks.
Quantum Physics Data Structures and Algorithms
no code implementations • 18 Dec 2000 • Dorit Aharonov, Andris Ambainis, Julia Kempe, Umesh Vazirani
We set the ground for a theory of quantum walks on graphs- the generalization of random walks on finite graphs to the quantum world.
Quantum Physics
no code implementations • 15 Aug 2000 • Dave Bacon, Andrew M. Childs, Isaac L. Chuang, Julia Kempe, Debbie W. Leung, Xinlan Zhou
Although the conditions for performing arbitrary unitary operations to simulate the dynamics of a closed quantum system are well understood, the same is not true of the more general class of quantum operations (also known as superoperators) corresponding to the dynamics of open quantum systems.
Quantum Physics