1 code implementation • 1 Feb 2024 • Umberto Cappellazzo, Daniele Falavigna, Alessio Brutti
It exploits adapters as the experts and, leveraging the recent Soft MoE method, it relies on a soft assignment between the input tokens and experts to keep the computational time limited.
1 code implementation • 6 Dec 2023 • Umberto Cappellazzo, Daniele Falavigna, Alessio Brutti, Mirco Ravanelli
The common modus operandi of fine-tuning large pre-trained Transformer models entails the adaptation of all their parameters (i. e., full fine-tuning).
no code implementations • 4 Oct 2023 • Umberto Cappellazzo, Enrico Fini, Muqiao Yang, Daniele Falavigna, Alessio Brutti, Bhiksha Raj
In this paper, we investigate the problem of learning sequence-to-sequence models for spoken language understanding in a class-incremental learning (CIL) setting and we propose COCONUT, a CIL method that relies on the combination of experience replay and contrastive learning.
1 code implementation • 18 Sep 2023 • George August Wright, Umberto Cappellazzo, Salah Zaiem, Desh Raj, Lucas Ondel Yang, Daniele Falavigna, Mohamed Nabih Ali, Alessio Brutti
In self-attention models for automatic speech recognition (ASR), early-exit architectures enable the development of dynamic models capable of adapting their size and architecture to varying levels of computational resources and ASR performance demands.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 29 May 2023 • Luca Serafini, Samuele Cornell, Giovanni Morrone, Enrico Zovato, Alessio Brutti, Stefano Squartini
We found that, among all methods considered, EEND-vector clustering (EEND-VC) offers the best trade-off in terms of computing requirements and performance.
1 code implementation • 23 May 2023 • Umberto Cappellazzo, Muqiao Yang, Daniele Falavigna, Alessio Brutti
The ability to learn new concepts sequentially is a major weakness for modern neural networks, which hinders their use in non-stationary environments.
no code implementations • 21 Mar 2023 • Giovanni Morrone, Samuele Cornell, Luca Serafini, Enrico Zovato, Alessio Brutti, Stefano Squartini
Finally, we also show that the separated signals can be readily used also for automatic speech recognition, reaching performance close to using oracle sources in some configurations.
no code implementations • 12 Mar 2023 • Mohamed Nabih Ali, Alessio Brutti, Daniele Falavigna
Intent classification is a fundamental task in the spoken language understanding field that has recently gained the attention of the scientific community, mainly because of the feasibility of approaching it with end-to-end neural models.
no code implementations • 6 Mar 2023 • Mohamed Nabih Ali, Francesco Paissan, Daniele Falavigna, Alessio Brutti
Given the modular nature of the well-known Conv-Tasnet speech separation architecture, in this paper we consider three parameters that directly control the overall size of the model, namely: the number of residual blocks, the number of repetitions of the separation blocks and the number of channels in the depth-wise convolutions, and experimentally evaluate how they affect the speech separation performance.
1 code implementation • 15 Nov 2022 • Umberto Cappellazzo, Daniele Falavigna, Alessio Brutti
Continual learning refers to a dynamical framework in which a model receives a stream of non-stationary data over time and must adapt to new data while preserving previously acquired knowledge.
no code implementations • 8 Jun 2022 • Irene Martín-Morató, Francesco Paissan, Alberto Ancilotto, Toni Heittola, Annamaria Mesaros, Elisabetta Farella, Alessio Brutti, Tuomas Virtanen
The provided baseline system is a convolutional neural network which employs post-training quantization of parameters, resulting in 46. 5 K parameters, and 29. 23 million multiply-and-accumulate operations (MMACs).
no code implementations • 31 May 2022 • Giovanni Morrone, Samuele Cornell, Enrico Zovato, Alessio Brutti, Stefano Squartini
Continuous speech separation (CSS) is a recently proposed framework which aims at separating each speaker from an input mixture signal in a streaming fashion.
1 code implementation • 5 Apr 2022 • Giovanni Morrone, Samuele Cornell, Desh Raj, Luca Serafini, Enrico Zovato, Alessio Brutti, Stefano Squartini
In particular, we compare two low-latency speech separation models.
1 code implementation • 18 Feb 2022 • Vandana Rajan, Alessio Brutti, Andrea Cavallaro
Generally, models that fuse complementary information from multiple modalities outperform their uni-modal counterparts.
no code implementations • 2 Aug 2021 • Vandana Rajan, Alessio Brutti, Andrea Cavallaro
For this reason, we aim to improve the performance of uni-modal affect recognition models by transferring knowledge from a better-performing (or stronger) modality to a weaker modality during training.
1 code implementation • 6 Apr 2021 • Samuele Cornell, Alessio Brutti, Marco Matassoni, Stefano Squartini
Fully exploiting ad-hoc microphone networks for distant speech recognition is still an open issue.
no code implementations • 3 Nov 2020 • Vandana Rajan, Alessio Brutti, Andrea Cavallaro
The proposed multi-modal training framework uses cross-modal translation and correlation-based latent space alignment to improve the representations of the weaker modalities.
no code implementations • 29 Jan 2020 • Gianmarco Cerutti, Rahul Prasad, Alessio Brutti, Elisabetta Farella
This paper addresses the application of sound event detection at the edge, by optimizing deep learning techniques on resource-constrained embedded platforms for the IoT.
no code implementations • 6 Nov 2019 • Md Sahidullah, Jose Patino, Samuele Cornell, Ruiqing Yin, Sunit Sivasankaran, Hervé Bredin, Pavel Korshunov, Alessio Brutti, Romain Serizel, Emmanuel Vincent, Nicholas Evans, Sébastien Marcel, Stefano Squartini, Claude Barras
This paper describes the speaker diarization systems developed for the Second DIHARD Speech Diarization Challenge (DIHARD II) by the Speed team.
1 code implementation • 4 Nov 2019 • Enrico Fini, Alessio Brutti
Recently, a fully supervised speaker diarization approach was proposed (UIS-RNN) which models speakers using multiple instances of a parameter-sharing recurrent neural network.
Ranked #1 on Speaker Diarization on DIHARD II
1 code implementation • 26 Sep 2019 • Vandana Rajan, Alessio Brutti, Andrea Cavallaro
Computational paralinguistics aims to infer human emotions, personality traits and behavioural patterns from speech signals.