Search Results for author: Alessio Brutti

Found 21 papers, 10 papers with code

Efficient Fine-tuning of Audio Spectrogram Transformers via Soft Mixture of Adapters

1 code implementation • 1 Feb 2024 • Umberto Cappellazzo, Daniele Falavigna, Alessio Brutti

It exploits adapters as the experts and, leveraging the recent Soft MoE method, it relies on a soft assignment between the input tokens and experts to keep the computational time limited.

Transfer Learning

Paper
Code

Parameter-Efficient Transfer Learning of Audio Spectrogram Transformers

1 code implementation • 6 Dec 2023 • Umberto Cappellazzo, Daniele Falavigna, Alessio Brutti, Mirco Ravanelli

The common modus operandi of fine-tuning large pre-trained Transformer models entails the adaptation of all their parameters (i. e., full fine-tuning).

Audio Classification Few-Shot Learning +1

Paper
Code

Continual Contrastive Spoken Language Understanding

no code implementations • 4 Oct 2023 • Umberto Cappellazzo, Enrico Fini, Muqiao Yang, Daniele Falavigna, Alessio Brutti, Bhiksha Raj

In this paper, we investigate the problem of learning sequence-to-sequence models for spoken language understanding in a class-incremental learning (CIL) setting and we propose COCONUT, a CIL method that relies on the combination of experience replay and contrastive learning.

Class Incremental Learning Contrastive Learning +2

Paper
Add Code

Training dynamic models using early exits for automatic speech recognition on resource-constrained devices

1 code implementation • 18 Sep 2023 • George August Wright, Umberto Cappellazzo, Salah Zaiem, Desh Raj, Lucas Ondel Yang, Daniele Falavigna, Mohamed Nabih Ali, Alessio Brutti

In self-attention models for automatic speech recognition (ASR), early-exit architectures enable the development of dynamic models capable of adapting their size and architecture to varying levels of computational resources and ASR performance demands.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

An Experimental Review of Speaker Diarization methods with application to Two-Speaker Conversational Telephone Speech recordings

no code implementations • 29 May 2023 • Luca Serafini, Samuele Cornell, Giovanni Morrone, Enrico Zovato, Alessio Brutti, Stefano Squartini

We found that, among all methods considered, EEND-vector clustering (EEND-VC) offers the best trade-off in terms of computing requirements and performance.

Clustering speaker-diarization +4

Paper
Add Code

Sequence-Level Knowledge Distillation for Class-Incremental End-to-End Spoken Language Understanding

1 code implementation • 23 May 2023 • Umberto Cappellazzo, Muqiao Yang, Daniele Falavigna, Alessio Brutti

The ability to learn new concepts sequentially is a major weakness for modern neural networks, which hinders their use in non-stationary environments.

Continual Learning Knowledge Distillation +1

Paper
Code

End-to-End Integration of Speech Separation and Voice Activity Detection for Low-Latency Diarization of Telephone Conversations

no code implementations • 21 Mar 2023 • Giovanni Morrone, Samuele Cornell, Luca Serafini, Enrico Zovato, Alessio Brutti, Stefano Squartini

Finally, we also show that the separated signals can be readily used also for automatic speech recognition, reaching performance close to using oracle sources in some configurations.

Action Detection Activity Detection +4

Paper
Add Code

Improving the Intent Classification accuracy in Noisy Environment

no code implementations • 12 Mar 2023 • Mohamed Nabih Ali, Alessio Brutti, Daniele Falavigna

Intent classification is a fundamental task in the spoken language understanding field that has recently gained the attention of the scientific community, mainly because of the feasibility of approaching it with end-to-end neural models.

Automatic Speech Recognition Classification +6

Paper
Add Code

Scaling strategies for on-device low-complexity source separation with Conv-Tasnet

no code implementations • 6 Mar 2023 • Mohamed Nabih Ali, Francesco Paissan, Daniele Falavigna, Alessio Brutti

Given the modular nature of the well-known Conv-Tasnet speech separation architecture, in this paper we consider three parameters that directly control the overall size of the model, namely: the number of residual blocks, the number of repetitions of the separation blocks and the number of channels in the depth-wise convolutions, and experimentally evaluate how they affect the speech separation performance.

Speech Separation

Paper
Add Code

An Investigation of the Combination of Rehearsal and Knowledge Distillation in Continual Learning for Spoken Language Understanding

1 code implementation • 15 Nov 2022 • Umberto Cappellazzo, Daniele Falavigna, Alessio Brutti

Continual learning refers to a dynamical framework in which a model receives a stream of non-stationary data over time and must adapt to new data while preserving previously acquired knowledge.

Class Incremental Learning Incremental Learning +2

Paper
Code

Low-complexity acoustic scene classification in DCASE 2022 Challenge

no code implementations • 8 Jun 2022 • Irene Martín-Morató, Francesco Paissan, Alberto Ancilotto, Toni Heittola, Annamaria Mesaros, Elisabetta Farella, Alessio Brutti, Tuomas Virtanen

The provided baseline system is a convolutional neural network which employs post-training quantization of parameters, resulting in 46. 5 K parameters, and 29. 23 million multiply-and-accumulate operations (MMACs).

Acoustic Scene Classification Classification +2

Paper
Add Code

Conversational Speech Separation: an Evaluation Study for Streaming Applications

no code implementations • 31 May 2022 • Giovanni Morrone, Samuele Cornell, Enrico Zovato, Alessio Brutti, Stefano Squartini

Continuous speech separation (CSS) is a recently proposed framework which aims at separating each speaker from an input mixture signal in a streaming fashion.

Speech Separation

Paper
Add Code

Low-Latency Speech Separation Guided Diarization for Telephone Conversations

1 code implementation • 5 Apr 2022 • Giovanni Morrone, Samuele Cornell, Desh Raj, Luca Serafini, Enrico Zovato, Alessio Brutti, Stefano Squartini

In particular, we compare two low-latency speech separation models.

Action Detection Activity Detection +5

Paper
Code

Is Cross-Attention Preferable to Self-Attention for Multi-Modal Emotion Recognition?

1 code implementation • 18 Feb 2022 • Vandana Rajan, Alessio Brutti, Andrea Cavallaro

Generally, models that fuse complementary information from multiple modalities outperform their uni-modal counterparts.

Emotion Classification Emotion Recognition

Paper
Code

Cross-Modal Knowledge Transfer via Inter-Modal Translation and Alignment for Affect Recognition

no code implementations • 2 Aug 2021 • Vandana Rajan, Alessio Brutti, Andrea Cavallaro

For this reason, we aim to improve the performance of uni-modal affect recognition models by transferring knowledge from a better-performing (or stronger) modality to a weaker modality during training.

Sentiment Analysis Sentiment Classification +2

Paper
Add Code

Learning to Rank Microphones for Distant Speech Recognition

1 code implementation • 6 Apr 2021 • Samuele Cornell, Alessio Brutti, Marco Matassoni, Stefano Squartini

Fully exploiting ad-hoc microphone networks for distant speech recognition is still an open issue.

Distant Speech Recognition Learning-To-Rank +1

Paper
Code

Robust Latent Representations via Cross-Modal Translation and Alignment

no code implementations • 3 Nov 2020 • Vandana Rajan, Alessio Brutti, Andrea Cavallaro

The proposed multi-modal training framework uses cross-modal translation and correlation-based latent space alignment to improve the representations of the weaker modalities.

Emotion Recognition Translation

Paper
Add Code

Compact recurrent neural networks for acoustic event detection on low-energy low-complexity platforms

no code implementations • 29 Jan 2020 • Gianmarco Cerutti, Rahul Prasad, Alessio Brutti, Elisabetta Farella

This paper addresses the application of sound event detection at the edge, by optimizing deep learning techniques on resource-constrained embedded platforms for the IoT.

Event Detection Quantization +1

Paper
Add Code

The Speed Submission to DIHARD II: Contributions & Lessons Learned

no code implementations • 6 Nov 2019 • Md Sahidullah, Jose Patino, Samuele Cornell, Ruiqing Yin, Sunit Sivasankaran, Hervé Bredin, Pavel Korshunov, Alessio Brutti, Romain Serizel, Emmanuel Vincent, Nicholas Evans, Sébastien Marcel, Stefano Squartini, Claude Barras

This paper describes the speaker diarization systems developed for the Second DIHARD Speech Diarization Challenge (DIHARD II) by the Speed team.

Action Detection Activity Detection +4

Paper
Add Code

Supervised online diarization with sample mean loss for multi-domain data

1 code implementation • 4 Nov 2019 • Enrico Fini, Alessio Brutti

Recently, a fully supervised speaker diarization approach was proposed (UIS-RNN) which models speakers using multiple instances of a parameter-sharing recurrent neural network.

Ranked #1 on Speaker Diarization on DIHARD II

Clustering speaker-diarization +1

Paper
Code

ConflictNET: End-to-End Learning for Speech-based Conflict Intensity Estimation

1 code implementation • 26 Sep 2019 • Vandana Rajan, Alessio Brutti, Andrea Cavallaro

Computational paralinguistics aims to infer human emotions, personality traits and behavioural patterns from speech signals.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.