no code implementations • 15 Oct 2024 • Richard Diehl Martinez, Pietro Lesci, Paula Buttery
We find that nearly all layers in larger models stabilise early in training - within the first 20% - whereas layers in smaller models exhibit slower and less stable convergence, especially when their parameters have lower effective rank.
1 code implementation • 6 Jun 2024 • Pietro Lesci, Clara Meister, Thomas Hofmann, Andreas Vlachos, Tiago Pimentel
Understanding memorisation in language models has practical and societal implications, e. g., studying models' training dynamics or preventing copyright infringements.
2 code implementations • 8 Apr 2024 • Pietro Lesci, Andreas Vlachos
By dynamically selecting different anchors at each iteration it promotes class balance and prevents overfitting the initial decision boundary, thus promoting the discovery of new clusters of minority instances.
1 code implementation • 26 May 2023 • Pietro Lesci, Yoshinari Fujinuma, Momchil Hardalov, Chao Shang, Yassine Benajiba, Lluis Marquez
Sequence-to-sequence state-of-the-art systems for dialogue state tracking (DST) use the full dialogue history as input, represent the current state as a list with all the slots, and generate the entire state from scratch at each dialogue turn.