no code implementations • 18 Sep 2024 • Umberto Cappellazzo, Minsu Kim, Honglie Chen, Pingchuan Ma, Stavros Petridis, Daniele Falavigna, Alessio Brutti, Maja Pantic
For example, in the audio and speech domains, an LLM can be equipped with (automatic) speech recognition (ASR) abilities by just concatenating the audio tokens, computed with an audio encoder, and the text tokens to achieve state-of-the-art results.
Audio-Visual Speech Recognition Automatic Speech Recognition +3
no code implementations • 27 May 2024 • Mohamed Nabih Ali, Alessio Brutti, Daniele Falavigna
Automatic speech recognition models require large amounts of speech recordings for training.
1 code implementation • 1 Feb 2024 • Umberto Cappellazzo, Daniele Falavigna, Alessio Brutti
It exploits adapters as the experts and, leveraging the recent Soft MoE method, it relies on a soft assignment between the input tokens and experts to keep the computational time limited.
1 code implementation • 6 Dec 2023 • Umberto Cappellazzo, Daniele Falavigna, Alessio Brutti, Mirco Ravanelli
Parameter-efficient transfer learning (PETL) methods have emerged as a solid alternative to the standard full fine-tuning approach.
no code implementations • 4 Oct 2023 • Umberto Cappellazzo, Enrico Fini, Muqiao Yang, Daniele Falavigna, Alessio Brutti, Bhiksha Raj
In this paper, we investigate the problem of learning sequence-to-sequence models for spoken language understanding in a class-incremental learning (CIL) setting and we propose COCONUT, a CIL method that relies on the combination of experience replay and contrastive learning.
1 code implementation • 18 Sep 2023 • George August Wright, Umberto Cappellazzo, Salah Zaiem, Desh Raj, Lucas Ondel Yang, Daniele Falavigna, Mohamed Nabih Ali, Alessio Brutti
In self-attention models for automatic speech recognition (ASR), early-exit architectures enable the development of dynamic models capable of adapting their size and architecture to varying levels of computational resources and ASR performance demands.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 23 May 2023 • Umberto Cappellazzo, Muqiao Yang, Daniele Falavigna, Alessio Brutti
The ability to learn new concepts sequentially is a major weakness for modern neural networks, which hinders their use in non-stationary environments.
no code implementations • 12 Mar 2023 • Mohamed Nabih Ali, Alessio Brutti, Daniele Falavigna
Intent classification is a fundamental task in the spoken language understanding field that has recently gained the attention of the scientific community, mainly because of the feasibility of approaching it with end-to-end neural models.
no code implementations • 6 Mar 2023 • Mohamed Nabih Ali, Francesco Paissan, Daniele Falavigna, Alessio Brutti
Given the modular nature of the well-known Conv-Tasnet speech separation architecture, in this paper we consider three parameters that directly control the overall size of the model, namely: the number of residual blocks, the number of repetitions of the separation blocks and the number of channels in the depth-wise convolutions, and experimentally evaluate how they affect the speech separation performance.
1 code implementation • 15 Nov 2022 • Umberto Cappellazzo, Daniele Falavigna, Alessio Brutti
Continual learning refers to a dynamical framework in which a model receives a stream of non-stationary data over time and must adapt to new data while preserving previously acquired knowledge.
no code implementations • MTSummit 2021 • Roberto Gretter, Marco Matassoni, Daniele Falavigna
We address the problem of language model customization in applications where the ASR component needs to manage domain-specific terminology; although current state-of-the-art speech recognition technology provides excellent results for generic domains, the adaptation to specialized dictionaries or glossaries is still an open issue.
no code implementations • 23 Jun 2021 • Sara Papi, Edmondo Trentin, Roberto Gretter, Marco Matassoni, Daniele Falavigna
The paper copes with the task of automatic assessment of second language proficiency from the language learners' spoken responses to test prompts.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • LREC 2020 • Ornella Mich, Nadia Mana, Roberto Gretter, Marco Matassoni, Daniele Falavigna
Our system, based on ASR technology, implements the Cornoldi{'}s MT battery, which is a well-known Italian test to assess reading skills.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • LREC 2020 • Roberto Gretter, Marco Matassoni, Stefano Bannò, Daniele Falavigna
This paper describes "TLT-school" a corpus of speech utterances collected in schools of northern Italy for assessing the performance of students learning both English and German.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 15 Mar 2019 • Roberto Gretter, Katharina Allgaier, Svetlana Tchistiakova, Daniele Falavigna
This paper describes technology developed to automatically grade Italian students (ages 9-16) on their English and German spoken language proficiency.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 25 Sep 2018 • Marco Matassoni, Roberto Gretter, Daniele Falavigna, Diego Giuliani
This work deals with non-native children's speech and investigates both multi-task and transfer learning approaches to adapt a multi-language Deep Neural Network (DNN) to speakers, specifically children, learning a foreign language.
no code implementations • 22 Jun 2017 • Shahab Jalalvand, Matteo Negri, Daniele Falavigna, Marco Matassoni, Marco Turchi
In this paper we propose a novel variant of ROVER that takes advantage of ASR quality estimation (QE) for ranking the transcriptions at "segment level" instead of: i) relying on confidence scores, or ii) feeding ROVER with randomly ordered hypotheses.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 6 Feb 2017 • Daniele Falavigna, Marco Matassoni, Shahab Jalalvand, Matteo Negri, Marco Turchi
Our hypothesis is that significant improvements can be achieved by: i)automatically transcribing the evaluation data we are currently trying to recognise, and ii) selecting from it a subset of "good quality" instances based on the word error rate (WER) scores predicted by a QE component.