Search Results for author: Thomas Pellegrini

Found 20 papers, 10 papers with code

Weakly supervised discourse segmentation for multiparty oral conversations

1 code implementation EMNLP 2021 Lila Gravellier, Julie Hunter, Philippe Muller, Thomas Pellegrini, Isabelle Ferrané

Discourse segmentation, the first step of discourse analysis, has been shown to improve results for text summarization, translation and other NLP tasks.

Discourse Segmentation Segmentation +3

Audio classification with Dilated Convolution with Learnable Spacings

2 code implementations25 Sep 2023 Ismail Khalfaoui-Hassani, Timothée Masquelier, Thomas Pellegrini

Dilated convolution with learnable spacings (DCLS) is a recent convolution method in which the positions of the kernel elements are learned throughout training by backpropagation.

Audio Classification Audio Tagging

Dilated Convolution with Learnable Spacings: beyond bilinear interpolation

1 code implementation1 Jun 2023 Ismail Khalfaoui-Hassani, Thomas Pellegrini, Timothée Masquelier

Dilated Convolution with Learnable Spacings (DCLS) is a recently proposed variation of the dilated convolution in which the spacings between the non-zero elements in the kernel, or equivalently their positions, are learnable.

Is my automatic audio captioning system so bad? spider-max: a metric to consider several caption candidates

no code implementations14 Nov 2022 Etienne Labbé, Thomas Pellegrini, Julien Pinquier

For this reason, several complementary metrics, such as BLEU, CIDEr, SPICE and SPIDEr, are used to compare a single automatic caption to one or several captions of reference, produced by a human annotator.

AudioCaps Audio captioning +3

Audio-video fusion strategies for active speaker detection in meetings

no code implementations9 Jun 2022 Lionel Pibre, Francisco Madrigal, Cyrille Equoy, Frédéric Lerasle, Thomas Pellegrini, Julien Pinquier, Isabelle Ferrané

In this paper, we propose two different types of fusion for the detection of the active speaker, combining two visual modalities and an audio modality through neural networks.

Management Optical Flow Estimation +2

Dilated convolution with learnable spacings

2 code implementations7 Dec 2021 Ismail Khalfaoui-Hassani, Thomas Pellegrini, Timothée Masquelier

We call this method "Dilated Convolution with Learnable Spacings" (DCLS) and generalize it to the n-dimensional convolution case.

Image Classification Object Detection +1

End-to-end acoustic modelling for phone recognition of young readers

no code implementations4 Mar 2021 Lucile Gelin, Morgane Daniel, Julien Pinquier, Thomas Pellegrini

Through transfer learning, a Transformer model complemented with a Connectionist Temporal Classification (CTC) objective function, reaches a phone error rate of 28. 1%, outperforming a state-of-the-art DNN-HMM model by 6. 6% relative, as well as other end-to-end architectures by more than 8. 5% relative.

Acoustic Modelling Transfer Learning

Comparison of semi-supervised deep learning algorithms for audio classification

1 code implementation16 Feb 2021 Léo Cances, Etienne Labbé, Thomas Pellegrini

In all but one cases, MM, RMM, and FM outperformed MT and DCT significantly, MM and RMM being the best methods in most experiments.

Audio Tagging Data Augmentation +3

Low-activity supervised convolutional spiking neural networks applied to speech commands recognition

1 code implementation13 Nov 2020 Thomas Pellegrini, Romain Zimmer, Timothée Masquelier

Deep Neural Networks (DNNs) are the current state-of-the-art models in many speech related tasks.

Technical report: supervised training of convolutional spiking neural networks with PyTorch

2 code implementations22 Nov 2019 Romain Zimmer, Thomas Pellegrini, Srisht Fateh Singh, Timothée Masquelier

Indeed, the most commonly used spiking neuron model, the leaky integrate-and-fire neuron, obeys a differential equation which can be approximated using discrete time steps, leading to a recurrent relation for the potential.

Evaluation of post-processing algorithms for polyphonic sound event detection

1 code implementation17 Jun 2019 Leo Cances, Patrice Guyot, Thomas Pellegrini

We compared post-processing algorithms on the temporal prediction curves of two models: one based on the challenge's baseline and a Multiple Instance Learning (MIL) model.

Audio Tagging Event Detection +2

Cosine-similarity penalty to discriminate sound classes in weakly-supervised sound event detection

1 code implementation10 Jan 2019 Thomas Pellegrini, Léo Cances

In this work, we address Sound Event Detection in the case where a weakly annotated dataset is available for training.

Sound Audio and Speech Processing

R\'eseau de neurones convolutif pour l'\'evaluation automatique de la prononciation (CNN-based automatic pronunciation assessment of Japanese speakers learning French )

no code implementations JEPTALNRECITAL 2016 Thomas Pellegrini, Lionel Fontan, Halima Sahraoui

Un gain de performance relatif de 13, 4{\%} a {\'e}t{\'e} obtenu avec le CNN, avec une pr{\'e}cision globale de 72, 6{\%}, sur un corpus d{'}{\'e}valuation enregistr{\'e} par 23 locuteurs japonophones.

El-WOZ: a client-server wizard-of-oz interface

no code implementations LREC 2014 Thomas Pellegrini, Vahid Hedayati, Angela Costa

In order to collect spontaneous speech in a situation of interaction with a machine, this interface was designed as a Wizard-of-Oz (WOZ) plateform.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Cannot find the paper you are looking for? You can Submit a new open access paper.