Search Results for author: Mirco Ravanelli

Found 38 papers, 28 papers with code

Learning Representations for New Sound Classes With Continual Self-Supervised Learning

no code implementations15 May 2022 Zhepei Wang, Cem Subakan, Xilin Jiang, Junkai Wu, Efthymios Tzinis, Mirco Ravanelli, Paris Smaragdis

In this paper, we present a self-supervised learning framework for continually learning representations for new sound classes.

Self-Supervised Learning

On Using Transformers for Speech-Separation

1 code implementation6 Feb 2022 Cem Subakan, Mirco Ravanelli, Samuele Cornell, Francois Grondin, Mirko Bronzi

In this paper, we extend our previous work by providing results on more datasets including LibriMix, and WHAM!, WHAMR!

Denoising Speech Enhancement +1

OSSEM: one-shot speaker adaptive speech enhancement using meta learning

no code implementations10 Nov 2021 Cheng Yu, Szu-Wei Fu, Tsun-An Hsieh, Yu Tsao, Mirco Ravanelli

Although deep learning (DL) has achieved notable progress in speech enhancement (SE), further research is still required for a DL-based SE system to adapt effectively and efficiently to particular speakers.

Meta-Learning Speech Enhancement

REAL-M: Towards Speech Separation on Real Mixtures

1 code implementation20 Oct 2021 Cem Subakan, Mirco Ravanelli, Samuele Cornell, François Grondin

First, we release the REAL-M dataset, a crowd-sourced corpus of real-life mixtures.

Speech Separation

MetricGAN-U: Unsupervised speech enhancement/ dereverberation based only on noisy/ reverberated speech

1 code implementation12 Oct 2021 Szu-Wei Fu, Cheng Yu, Kuo-Hsuan Hung, Mirco Ravanelli, Yu Tsao

Most of the deep learning-based speech enhancement models are learned in a supervised manner, which implies that pairs of noisy and clean speech are required during training.

Speech Enhancement

Interpretable SincNet-based Deep Learning for Emotion Recognition from EEG brain activity

1 code implementation18 Jul 2021 Juan Manuel Mayor-Torres, Mirco Ravanelli, Sara E. Medina-DeVilliers, Matthew D. Lerner, Giuseppe Riccardi

This result is consistent with recent neuroscience studies on emotion recognition, which found an association between these band suppressions and the behavioral deficits observed in individuals with ASD.

EEG Emotion Recognition

MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement

2 code implementations8 Apr 2021 Szu-Wei Fu, Cheng Yu, Tsun-An Hsieh, Peter Plantinga, Mirco Ravanelli, Xugang Lu, Yu Tsao

The discrepancy between the cost function used for training a speech enhancement model and human auditory perception usually makes the quality of enhanced speech unsatisfactory.

Speech Enhancement

Timers and Such: A Practical Benchmark for Spoken Language Understanding with Numbers

1 code implementation4 Apr 2021 Loren Lugosch, Piyush Papreja, Mirco Ravanelli, Abdelwahab Heba, Titouan Parcollet

This paper introduces Timers and Such, a new open source dataset of spoken English commands for common voice control use cases involving numbers.

Spoken Language Understanding

Transformers with Competitive Ensembles of Independent Mechanisms

no code implementations27 Feb 2021 Alex Lamb, Di He, Anirudh Goyal, Guolin Ke, Chien-Feng Liao, Mirco Ravanelli, Yoshua Bengio

In this work we explore a way in which the Transformer architecture is deficient: it represents each position with a large monolithic hidden representation and a single set of parameters which are applied over the entire hidden representation.

Speech Enhancement

Attention is All You Need in Speech Separation

3 code implementations25 Oct 2020 Cem Subakan, Mirco Ravanelli, Samuele Cornell, Mirko Bronzi, Jianyuan Zhong

Transformers are emerging as a natural alternative to standard RNNs, replacing recurrent computations with a multi-head attention mechanism.

Speech Separation

BIRD: Big Impulse Response Dataset

1 code implementation19 Oct 2020 François Grondin, Jean-Samuel Lauzon, Simon Michaud, Mirco Ravanelli, François Michaud

This paper introduces BIRD, the Big Impulse Response Dataset.

Sound Audio and Speech Processing

Quaternion Neural Networks for Multi-channel Distant Speech Recognition

1 code implementation18 May 2020 Xinchi Qiu, Titouan Parcollet, Mirco Ravanelli, Nicholas Lane, Mohamed Morchid

In this paper, we propose to capture these inter- and intra- structural dependencies with quaternion neural networks, which can jointly process multiple signals as whole quaternion entities.

Automatic Speech Recognition Distant Speech Recognition

Multi-task self-supervised learning for Robust Speech Recognition

1 code implementation25 Jan 2020 Mirco Ravanelli, Jianyuan Zhong, Santiago Pascual, Pawel Swietojanski, Joao Monteiro, Jan Trmal, Yoshua Bengio

We then propose a revised encoder that better learns short- and long-term speech dynamics with an efficient combination of recurrent and convolutional networks.

Robust Speech Recognition Self-Supervised Learning

Using Speech Synthesis to Train End-to-End Spoken Language Understanding Models

2 code implementations21 Oct 2019 Loren Lugosch, Brett Meyer, Derek Nowrouzezahrai, Mirco Ravanelli

End-to-end models are an attractive new approach to spoken language understanding (SLU) in which the meaning of an utterance is inferred directly from the raw audio without employing the standard pipeline composed of a separately trained speech recognizer and natural language understanding module.

Data Augmentation Natural Language Understanding +2

Retrieving Signals in the Frequency Domain with Deep Complex Extractors

1 code implementation25 Sep 2019 Chiheb Trabelsi, Olexa Bilaniuk, Ousmane Dia, Ying Zhang, Mirco Ravanelli, Jonathan Binas, Negar Rostamzadeh, Christopher J Pal

Using the Wall Street Journal Dataset, we compare our phase-aware loss to several others that operate both in the time and frequency domains and demonstrate the effectiveness of our proposed signal extraction method and proposed loss.

Audio Source Separation

Speech Model Pre-training for End-to-End Spoken Language Understanding

1 code implementation7 Apr 2019 Loren Lugosch, Mirco Ravanelli, Patrick Ignoto, Vikrant Singh Tomar, Yoshua Bengio

Whereas conventional spoken language understanding (SLU) systems map speech to text, and then text to intent, end-to-end SLU systems map speech directly to intent through a single trainable model.

Ranked #2 on Spoken Language Understanding on Fluent Speech Commands (using extra training data)

Spoken Language Understanding

Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks

1 code implementation6 Apr 2019 Santiago Pascual, Mirco Ravanelli, Joan Serrà, Antonio Bonafonte, Yoshua Bengio

Learning good representations without supervision is still an open issue in machine learning, and is particularly challenging for speech signals, which are often characterized by long sequences with a complex hierarchical structure.

Distant Speech Recognition

Speech and Speaker Recognition from Raw Waveform with SincNet

2 code implementations13 Dec 2018 Mirco Ravanelli, Yoshua Bengio

Deep neural networks can learn complex and abstract representations, that are progressively obtained by combining simpler ones.

Speaker Recognition Speech Recognition

Learning Speaker Representations with Mutual Information

2 code implementations1 Dec 2018 Mirco Ravanelli, Yoshua Bengio

Mutual Information (MI) or similar measures of statistical dependence are promising tools for learning these representations in an unsupervised way.

Speaker Identification

Interpretable Convolutional Filters with SincNet

1 code implementation23 Nov 2018 Mirco Ravanelli, Yoshua Bengio

Deep learning is currently playing a crucial role toward higher levels of artificial intelligence.

Distant Speech Recognition

Speech recognition with quaternion neural networks

1 code implementation21 Nov 2018 Titouan Parcollet, Mirco Ravanelli, Mohamed Morchid, Georges Linarès, Renato de Mori

Neural network architectures are at the core of powerful automatic speech recognition systems (ASR).

Automatic Speech Recognition

The PyTorch-Kaldi Speech Recognition Toolkit

9 code implementations19 Nov 2018 Mirco Ravanelli, Titouan Parcollet, Yoshua Bengio

Experiments, that are conducted on several datasets and tasks, show that PyTorch-Kaldi can effectively be used to develop modern state-of-the-art speech recognizers.

Distant Speech Recognition Noisy Speech Recognition

Speaker Recognition from Raw Waveform with SincNet

23 code implementations29 Jul 2018 Mirco Ravanelli, Yoshua Bengio

Rather than employing standard hand-crafted features, the latter CNNs learn low-level speech representations from waveforms, potentially allowing the network to better capture important narrow-band speaker characteristics such as pitch and formants.

Speaker Identification Speaker Recognition +1

Quaternion Recurrent Neural Networks

3 code implementations ICLR 2019 Titouan Parcollet, Mirco Ravanelli, Mohamed Morchid, Georges Linarès, Chiheb Trabelsi, Renato de Mori, Yoshua Bengio

Recurrent neural networks (RNNs) are powerful architectures to model sequential data, due to their capability to learn short and long-term dependencies between the basic elements of a sequence.

Automatic Speech Recognition

Automatic context window composition for distant speech recognition

no code implementations26 May 2018 Mirco Ravanelli, Maurizio Omologo

Distant speech recognition is being revolutionized by deep learning, that has contributed to significantly outperform previous HMM-GMM systems.

Distant Speech Recognition Frame

Twin Regularization for online speech recognition

1 code implementation15 Apr 2018 Mirco Ravanelli, Dmitriy Serdyuk, Yoshua Bengio

Online speech recognition is crucial for developing natural human-machine interfaces.

Speech Recognition

Light Gated Recurrent Units for Speech Recognition

1 code implementation26 Mar 2018 Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio

A field that has directly benefited from the recent advances in deep learning is Automatic Speech Recognition (ASR).

Automatic Speech Recognition

Deep Learning for Distant Speech Recognition

no code implementations17 Dec 2017 Mirco Ravanelli

Deep learning is an emerging technology that is considered one of the most promising directions for reaching higher levels of artificial intelligence.

Distant Speech Recognition

Realistic multi-microphone data simulation for distant speech recognition

1 code implementation26 Nov 2017 Mirco Ravanelli, Piergiorgio Svaizer, Maurizio Omologo

The availability of realistic simulated corpora is of key importance for the future progress of distant speech recognition technology.

Audio and Speech Processing Sound

Contaminated speech training methods for robust DNN-HMM distant speech recognition

1 code implementation10 Oct 2017 Mirco Ravanelli, Maurizio Omologo

Despite the significant progress made in the last years, state-of-the-art speech recognition technologies provide a satisfactory performance only in the close-talking condition.

Distant Speech Recognition Speech Enhancement

The DIRHA-English corpus and related tasks for distant-speech recognition in domestic environments

1 code implementation6 Oct 2017 Mirco Ravanelli, Maurizio Omologo

This paper introduces the contents and the possible usage of the DIRHA-ENGLISH multi-microphone corpus, recently realized under the EC DIRHA project.

Distant Speech Recognition

Improving speech recognition by revising gated recurrent units

1 code implementation29 Sep 2017 Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio

First, we suggest to remove the reset gate in the GRU design, resulting in a more efficient single-gate architecture.

Speech Recognition

A network of deep neural networks for distant speech recognition

no code implementations23 Mar 2017 Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio

Despite the remarkable progress recently made in distant speech recognition, state-of-the-art technology still suffers from a lack of robustness, especially when adverse acoustic conditions characterized by non-stationary noises and reverberation are met.

Distant Speech Recognition Speech Enhancement

The DIRHA simulated corpus

no code implementations LREC 2014 Luca Cristoforetti, Mirco Ravanelli, Maurizio Omologo, Aless Sosi, ro, Alberto Abad, Martin Hagmueller, Petros Maragos

This paper describes a multi-microphone multi-language acoustic corpus being developed under the EC project Distant-speech Interaction for Robust Home Applications (DIRHA).

Dialogue Management Distant Speech Recognition +1

Cannot find the paper you are looking for? You can Submit a new open access paper.