Search Results for author: Daniele Falavigna

Found 19 papers, 5 papers with code

Large Language Models Are Strong Audio-Visual Speech Recognition Learners

no code implementations18 Sep 2024 Umberto Cappellazzo, Minsu Kim, Honglie Chen, Pingchuan Ma, Stavros Petridis, Daniele Falavigna, Alessio Brutti, Maja Pantic

For example, in the audio and speech domains, an LLM can be equipped with (automatic) speech recognition (ASR) abilities by just concatenating the audio tokens, computed with an audio encoder, and the text tokens to achieve state-of-the-art results.

Audio-Visual Speech Recognition Automatic Speech Recognition +3

Efficient Fine-tuning of Audio Spectrogram Transformers via Soft Mixture of Adapters

1 code implementation1 Feb 2024 Umberto Cappellazzo, Daniele Falavigna, Alessio Brutti

It exploits adapters as the experts and, leveraging the recent Soft MoE method, it relies on a soft assignment between the input tokens and experts to keep the computational time limited.

parameter-efficient fine-tuning State Space Models +1

Parameter-Efficient Transfer Learning of Audio Spectrogram Transformers

1 code implementation6 Dec 2023 Umberto Cappellazzo, Daniele Falavigna, Alessio Brutti, Mirco Ravanelli

Parameter-efficient transfer learning (PETL) methods have emerged as a solid alternative to the standard full fine-tuning approach.

Audio Classification Few-Shot Learning +1

Continual Contrastive Spoken Language Understanding

no code implementations4 Oct 2023 Umberto Cappellazzo, Enrico Fini, Muqiao Yang, Daniele Falavigna, Alessio Brutti, Bhiksha Raj

In this paper, we investigate the problem of learning sequence-to-sequence models for spoken language understanding in a class-incremental learning (CIL) setting and we propose COCONUT, a CIL method that relies on the combination of experience replay and contrastive learning.

class-incremental learning Class Incremental Learning +4

Training dynamic models using early exits for automatic speech recognition on resource-constrained devices

1 code implementation18 Sep 2023 George August Wright, Umberto Cappellazzo, Salah Zaiem, Desh Raj, Lucas Ondel Yang, Daniele Falavigna, Mohamed Nabih Ali, Alessio Brutti

In self-attention models for automatic speech recognition (ASR), early-exit architectures enable the development of dynamic models capable of adapting their size and architecture to varying levels of computational resources and ASR performance demands.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Sequence-Level Knowledge Distillation for Class-Incremental End-to-End Spoken Language Understanding

1 code implementation23 May 2023 Umberto Cappellazzo, Muqiao Yang, Daniele Falavigna, Alessio Brutti

The ability to learn new concepts sequentially is a major weakness for modern neural networks, which hinders their use in non-stationary environments.

Continual Learning Decoder +2

Improving the Intent Classification accuracy in Noisy Environment

no code implementations12 Mar 2023 Mohamed Nabih Ali, Alessio Brutti, Daniele Falavigna

Intent classification is a fundamental task in the spoken language understanding field that has recently gained the attention of the scientific community, mainly because of the feasibility of approaching it with end-to-end neural models.

Automatic Speech Recognition Classification +6

Scaling strategies for on-device low-complexity source separation with Conv-Tasnet

no code implementations6 Mar 2023 Mohamed Nabih Ali, Francesco Paissan, Daniele Falavigna, Alessio Brutti

Given the modular nature of the well-known Conv-Tasnet speech separation architecture, in this paper we consider three parameters that directly control the overall size of the model, namely: the number of residual blocks, the number of repetitions of the separation blocks and the number of channels in the depth-wise convolutions, and experimentally evaluate how they affect the speech separation performance.

Speech Separation

An Investigation of the Combination of Rehearsal and Knowledge Distillation in Continual Learning for Spoken Language Understanding

1 code implementation15 Nov 2022 Umberto Cappellazzo, Daniele Falavigna, Alessio Brutti

Continual learning refers to a dynamical framework in which a model receives a stream of non-stationary data over time and must adapt to new data while preserving previously acquired knowledge.

class-incremental learning Class Incremental Learning +3

Seed Words Based Data Selection for Language Model Adaptation

no code implementations MTSummit 2021 Roberto Gretter, Marco Matassoni, Daniele Falavigna

We address the problem of language model customization in applications where the ASR component needs to manage domain-specific terminology; although current state-of-the-art speech recognition technology provides excellent results for generic domains, the adaptation to specialized dictionaries or glossaries is still an open issue.

Language Modelling Semantic Similarity +3

Mixtures of Deep Neural Experts for Automated Speech Scoring

no code implementations23 Jun 2021 Sara Papi, Edmondo Trentin, Roberto Gretter, Marco Matassoni, Daniele Falavigna

The paper copes with the task of automatic assessment of second language proficiency from the language learners' spoken responses to test prompts.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

TLT-school: a Corpus of Non Native Children Speech

no code implementations LREC 2020 Roberto Gretter, Marco Matassoni, Stefano Bannò, Daniele Falavigna

This paper describes "TLT-school" a corpus of speech utterances collected in schools of northern Italy for assessing the performance of students learning both English and German.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Automatic assessment of spoken language proficiency of non-native children

no code implementations15 Mar 2019 Roberto Gretter, Katharina Allgaier, Svetlana Tchistiakova, Daniele Falavigna

This paper describes technology developed to automatically grade Italian students (ages 9-16) on their English and German spoken language proficiency.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Non-native children speech recognition through transfer learning

no code implementations25 Sep 2018 Marco Matassoni, Roberto Gretter, Daniele Falavigna, Diego Giuliani

This work deals with non-native children's speech and investigates both multi-task and transfer learning approaches to adapt a multi-language Deep Neural Network (DNN) to speakers, specifically children, learning a foreign language.

speech-recognition Speech Recognition +1

Automatic Quality Estimation for ASR System Combination

no code implementations22 Jun 2017 Shahab Jalalvand, Matteo Negri, Daniele Falavigna, Marco Matassoni, Marco Turchi

In this paper we propose a novel variant of ROVER that takes advantage of ASR quality estimation (QE) for ranking the transcriptions at "segment level" instead of: i) relying on confidence scores, or ii) feeding ROVER with randomly ordered hypotheses.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

DNN adaptation by automatic quality estimation of ASR hypotheses

no code implementations6 Feb 2017 Daniele Falavigna, Marco Matassoni, Shahab Jalalvand, Matteo Negri, Marco Turchi

Our hypothesis is that significant improvements can be achieved by: i)automatically transcribing the evaluation data we are currently trying to recognise, and ii) selecting from it a subset of "good quality" instances based on the word error rate (WER) scores predicted by a QE component.

Sentence

Cannot find the paper you are looking for? You can Submit a new open access paper.