no code implementations • 16 Feb 2024 • Benjamin L. Edelman, Ezra Edelman, Surbhi Goel, Eran Malach, Nikolaos Tsilivis
We examine how learning is affected by varying the prior distribution over Markov chains, and consider the generalization of our in-context learning of Markov chains (ICL-MC) task to $n$-grams for $n > 2$.
no code implementations • 13 Nov 2023 • Jingtong Su, Ya Shi Zhang, Nikolaos Tsilivis, Julia Kempe
Neural Collapse refers to the curious phenomenon in the end of training of a neural network, where feature vectors and classification weights converge to a very simple geometrical arrangement (a simplex).
no code implementations • 5 Jul 2023 • Francesco Cagnetta, Deborah Oliveira, Mahalakshmi Sabanayagam, Nikolaos Tsilivis, Julia Kempe
Lecture notes from the course given by Professor Julia Kempe at the summer school "Statistical physics of Machine Learning" in Les Houches.
1 code implementation • 21 Mar 2023 • William Merrill, Nikolaos Tsilivis, Aman Shukla
Grokking is a phenomenon where a model trained on an algorithmic task first overfits but, then, after a large amount of additional training, undergoes a phase transition to generalize perfectly.
1 code implementation • 11 Oct 2022 • Nikolaos Tsilivis, Julia Kempe
The adversarial vulnerability of neural nets, and subsequent techniques to create robust models have attracted significant attention; yet we still lack a full understanding of this phenomenon.
1 code implementation • 24 Jul 2022 • Nikolaos Tsilivis, Jingtong Su, Julia Kempe
In parallel, we revisit prior work that also focused on the problem of data optimization for robust classification \citep{Ily+19}, and show that being robust to adversarial attacks after standard (gradient descent) training on a suitable dataset is more challenging than previously thought.
no code implementations • 28 Jan 2022 • William Merrill, Nikolaos Tsilivis
One way to interpret the behavior of a blackbox recurrent neural network (RNN) is to extract from it a more interpretable discrete computational model, like a finite state machine, that captures its behavior.
no code implementations • 29 Sep 2021 • Nikolaos Tsilivis, Julia Kempe
In particular, in the regime where the Neural Tangent Kernel theory holds, we derive a simple, but powerful strategy for attacking models, which in contrast to prior work, does not require any access to the model under attack, or any trained replica of it for that matter.