1 code implementation • 4 Mar 2025 • Nathan Godey, Alessio Devoto, Yu Zhao, Simone Scardapane, Pasquale Minervini, Éric de la Clergerie, Benoît Sagot
Autoregressive language models rely on a Key-Value (KV) Cache, which avoids re-computing past hidden states during generation, making it faster.
no code implementations • 6 Jan 2025 • Donatella Genovese, Alessandro Sgroi, Alessio Devoto, Samuel Valentine, Lennox Wood, Cristiano Sebastiani, Stefano Giagu, Monica D'Onofrio, Simone Scardapane
In this paper, we propose a novel approach that combines a Graph Transformer model with Mixture-of-Expert layers to achieve high predictive performance while embedding interpretability into the architecture.
no code implementations • 27 Dec 2024 • Jary Pomponi, Mattia Merluzzi, Alessio Devoto, Mateus Pontes Mota, Paolo Di Lorenzo, Simone Scardapane
This paper presents a novel framework for goal-oriented semantic communications leveraging recursive early exit models.
1 code implementation • 21 Oct 2024 • Yu Zhao, Xiaotang Du, Giwon Hong, Aryo Pradipta Gema, Alessio Devoto, Hongru Wang, Xuanli He, Kam-Fai Wong, Pasquale Minervini
Through probing tasks, we find that LLMs can internally register the signal of knowledge conflict in the residual stream, which can be accurately detected by probing the intermediate model activations.
1 code implementation • 21 Oct 2024 • Yu Zhao, Alessio Devoto, Giwon Hong, Xiaotang Du, Aryo Pradipta Gema, Hongru Wang, Xuanli He, Kam-Fai Wong, Pasquale Minervini
In this work, we propose \textsc{SpARE}, a \emph{training-free} representation engineering method that uses pre-trained sparse auto-encoders (SAEs) to control the knowledge selection behaviour of LLMs.
no code implementations • 16 Aug 2024 • Alessio Devoto, Federico Alvetreti, Jary Pomponi, Paolo Di Lorenzo, Pasquale Minervini, Simone Scardapane
To this end, in this paper we introduce an efficient fine-tuning method for ViTs called $\textbf{ALaST}$ ($\textit{Adaptive Layer Selection Fine-Tuning for Vision Transformers}$) to speed up the fine-tuning process while reducing computational cost, memory load, and training time.
2 code implementations • 17 Jun 2024 • Alessio Devoto, Yu Zhao, Simone Scardapane, Pasquale Minervini
Existing approaches to reduce the KV cache size involve either fine-tuning the model to learn a compression strategy or leveraging attention scores to reduce the sequence length.
3 code implementations • 6 Jun 2024 • Aryo Pradipta Gema, Joshua Ong Jun Leang, Giwon Hong, Alessio Devoto, Alberto Carlo Maria Mancino, Rohit Saxena, Xuanli He, Yu Zhao, Xiaotang Du, Mohammad Reza Ghasemi Madani, Claire Barale, Robert McHardy, Joshua Harris, Jean Kaddour, Emile van Krieken, Pasquale Minervini
For example, we find that 57% of the analysed questions in the Virology subset contain errors.
no code implementations • 25 Apr 2024 • Alessio Devoto, Simone Petruzzi, Jary Pomponi, Paolo Di Lorenzo, Simone Scardapane
In this paper, we propose a novel design for AI-native goal-oriented communications, exploiting transformer neural networks under dynamic inference constraints on bandwidth and computation.
no code implementations • 12 Mar 2024 • Simone Scardapane, Alessandro Baiocchi, Alessio Devoto, Valerio Marsocci, Pasquale Minervini, Jary Pomponi
This article summarizes principles and ideas from the emerging area of applying \textit{conditional computation} methods to the design of neural networks.
2 code implementations • 2 Feb 2024 • Jary Pomponi, Alessio Devoto, Simone Scardapane
The latter is a gated incremental classifier, helping the model modify past predictions without directly interfering with them.
1 code implementation • 15 Dec 2023 • Bartosz Wójcik, Alessio Devoto, Karol Pustelnik, Pasquale Minervini, Simone Scardapane
While transformer models have been highly successful, they are computationally inefficient.