Search Results for author: Alessio Brutti

Found 21 papers, 10 papers with code

Efficient Fine-tuning of Audio Spectrogram Transformers via Soft Mixture of Adapters

1 code implementation1 Feb 2024 Umberto Cappellazzo, Daniele Falavigna, Alessio Brutti

It exploits adapters as the experts and, leveraging the recent Soft MoE method, it relies on a soft assignment between the input tokens and experts to keep the computational time limited.

Transfer Learning

Parameter-Efficient Transfer Learning of Audio Spectrogram Transformers

1 code implementation6 Dec 2023 Umberto Cappellazzo, Daniele Falavigna, Alessio Brutti, Mirco Ravanelli

The common modus operandi of fine-tuning large pre-trained Transformer models entails the adaptation of all their parameters (i. e., full fine-tuning).

Audio Classification Few-Shot Learning +1

Continual Contrastive Spoken Language Understanding

no code implementations4 Oct 2023 Umberto Cappellazzo, Enrico Fini, Muqiao Yang, Daniele Falavigna, Alessio Brutti, Bhiksha Raj

In this paper, we investigate the problem of learning sequence-to-sequence models for spoken language understanding in a class-incremental learning (CIL) setting and we propose COCONUT, a CIL method that relies on the combination of experience replay and contrastive learning.

Class Incremental Learning Contrastive Learning +2

Training dynamic models using early exits for automatic speech recognition on resource-constrained devices

1 code implementation18 Sep 2023 George August Wright, Umberto Cappellazzo, Salah Zaiem, Desh Raj, Lucas Ondel Yang, Daniele Falavigna, Mohamed Nabih Ali, Alessio Brutti

In self-attention models for automatic speech recognition (ASR), early-exit architectures enable the development of dynamic models capable of adapting their size and architecture to varying levels of computational resources and ASR performance demands.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Sequence-Level Knowledge Distillation for Class-Incremental End-to-End Spoken Language Understanding

1 code implementation23 May 2023 Umberto Cappellazzo, Muqiao Yang, Daniele Falavigna, Alessio Brutti

The ability to learn new concepts sequentially is a major weakness for modern neural networks, which hinders their use in non-stationary environments.

Continual Learning Knowledge Distillation +1

End-to-End Integration of Speech Separation and Voice Activity Detection for Low-Latency Diarization of Telephone Conversations

no code implementations21 Mar 2023 Giovanni Morrone, Samuele Cornell, Luca Serafini, Enrico Zovato, Alessio Brutti, Stefano Squartini

Finally, we also show that the separated signals can be readily used also for automatic speech recognition, reaching performance close to using oracle sources in some configurations.

Action Detection Activity Detection +4

Improving the Intent Classification accuracy in Noisy Environment

no code implementations12 Mar 2023 Mohamed Nabih Ali, Alessio Brutti, Daniele Falavigna

Intent classification is a fundamental task in the spoken language understanding field that has recently gained the attention of the scientific community, mainly because of the feasibility of approaching it with end-to-end neural models.

Automatic Speech Recognition Classification +6

Scaling strategies for on-device low-complexity source separation with Conv-Tasnet

no code implementations6 Mar 2023 Mohamed Nabih Ali, Francesco Paissan, Daniele Falavigna, Alessio Brutti

Given the modular nature of the well-known Conv-Tasnet speech separation architecture, in this paper we consider three parameters that directly control the overall size of the model, namely: the number of residual blocks, the number of repetitions of the separation blocks and the number of channels in the depth-wise convolutions, and experimentally evaluate how they affect the speech separation performance.

Speech Separation

An Investigation of the Combination of Rehearsal and Knowledge Distillation in Continual Learning for Spoken Language Understanding

1 code implementation15 Nov 2022 Umberto Cappellazzo, Daniele Falavigna, Alessio Brutti

Continual learning refers to a dynamical framework in which a model receives a stream of non-stationary data over time and must adapt to new data while preserving previously acquired knowledge.

Class Incremental Learning Incremental Learning +2

Low-complexity acoustic scene classification in DCASE 2022 Challenge

no code implementations8 Jun 2022 Irene Martín-Morató, Francesco Paissan, Alberto Ancilotto, Toni Heittola, Annamaria Mesaros, Elisabetta Farella, Alessio Brutti, Tuomas Virtanen

The provided baseline system is a convolutional neural network which employs post-training quantization of parameters, resulting in 46. 5 K parameters, and 29. 23 million multiply-and-accumulate operations (MMACs).

Acoustic Scene Classification Classification +2

Conversational Speech Separation: an Evaluation Study for Streaming Applications

no code implementations31 May 2022 Giovanni Morrone, Samuele Cornell, Enrico Zovato, Alessio Brutti, Stefano Squartini

Continuous speech separation (CSS) is a recently proposed framework which aims at separating each speaker from an input mixture signal in a streaming fashion.

Speech Separation

Is Cross-Attention Preferable to Self-Attention for Multi-Modal Emotion Recognition?

1 code implementation18 Feb 2022 Vandana Rajan, Alessio Brutti, Andrea Cavallaro

Generally, models that fuse complementary information from multiple modalities outperform their uni-modal counterparts.

Emotion Classification Emotion Recognition

Cross-Modal Knowledge Transfer via Inter-Modal Translation and Alignment for Affect Recognition

no code implementations2 Aug 2021 Vandana Rajan, Alessio Brutti, Andrea Cavallaro

For this reason, we aim to improve the performance of uni-modal affect recognition models by transferring knowledge from a better-performing (or stronger) modality to a weaker modality during training.

Sentiment Analysis Sentiment Classification +2

Robust Latent Representations via Cross-Modal Translation and Alignment

no code implementations3 Nov 2020 Vandana Rajan, Alessio Brutti, Andrea Cavallaro

The proposed multi-modal training framework uses cross-modal translation and correlation-based latent space alignment to improve the representations of the weaker modalities.

Emotion Recognition Translation

Compact recurrent neural networks for acoustic event detection on low-energy low-complexity platforms

no code implementations29 Jan 2020 Gianmarco Cerutti, Rahul Prasad, Alessio Brutti, Elisabetta Farella

This paper addresses the application of sound event detection at the edge, by optimizing deep learning techniques on resource-constrained embedded platforms for the IoT.

Event Detection Quantization +1

Supervised online diarization with sample mean loss for multi-domain data

1 code implementation4 Nov 2019 Enrico Fini, Alessio Brutti

Recently, a fully supervised speaker diarization approach was proposed (UIS-RNN) which models speakers using multiple instances of a parameter-sharing recurrent neural network.

Clustering speaker-diarization +1

ConflictNET: End-to-End Learning for Speech-based Conflict Intensity Estimation

1 code implementation26 Sep 2019 Vandana Rajan, Alessio Brutti, Andrea Cavallaro

Computational paralinguistics aims to infer human emotions, personality traits and behavioural patterns from speech signals.

Cannot find the paper you are looking for? You can Submit a new open access paper.