no code implementations • 15 Feb 2024 • Michael Hahn, Mark Rofin
We show theoretically and empirically that this theory unifies a broad array of empirical observations about the learning abilities and biases of transformers, such as their generalization bias towards low sensitivity and low degree, and difficulty in length generalization for PARITY.
no code implementations • 22 Nov 2022 • Mark Rofin, Nikita Balagansky, Daniil Gavrilov
The simplest way to obtain continuous interpolation between two points in high dimensional space is to draw a line between them.
1 code implementation • 11 Oct 2022 • Mark Rofin, Vladislav Mikhailov, Mikhail Florinskiy, Andrey Kravchenko, Elena Tutubalina, Tatiana Shavrina, Daniel Karabekyan, Ekaterina Artemova
The development of state-of-the-art systems in different applied areas of machine learning (ML) is driven by benchmarks, which have shaped the paradigm of evaluating generalisation capabilities from multiple perspectives.