Search Results for author: Julien Pinquier

Found 14 papers, 2 papers with code

Is my automatic audio captioning system so bad? spider-max: a metric to consider several caption candidates

1 code implementation14 Nov 2022 Etienne Labbé, Thomas Pellegrini, Julien Pinquier

For this reason, several complementary metrics, such as BLEU, CIDEr, SPICE and SPIDEr, are used to compare a single automatic caption to one or several captions of reference, produced by a human annotator.

AudioCaps Audio captioning +3

Audio-video fusion strategies for active speaker detection in meetings

no code implementations9 Jun 2022 Lionel Pibre, Francisco Madrigal, Cyrille Equoy, Frédéric Lerasle, Thomas Pellegrini, Julien Pinquier, Isabelle Ferrané

In this paper, we propose two different types of fusion for the detection of the active speaker, combining two visual modalities and an audio modality through neural networks.

Management Optical Flow Estimation +2

End-to-end acoustic modelling for phone recognition of young readers

no code implementations4 Mar 2021 Lucile Gelin, Morgane Daniel, Julien Pinquier, Thomas Pellegrini

Through transfer learning, a Transformer model complemented with a Connectionist Temporal Classification (CTC) objective function, reaches a phone error rate of 28. 1%, outperforming a state-of-the-art DNN-HMM model by 6. 6% relative, as well as other end-to-end architectures by more than 8. 5% relative.

Acoustic Modelling Transfer Learning

Une nouvelle mesure de la r\'everb\'eration pour pr\'edire les performances a priori de la transcription de la parole (A new reverberation measure to predict a priori ASR performance)

no code implementations JEPTALNRECITAL 2020 S{\'e}bastien Ferreira, J{\'e}r{\^o}me Farinas, Julien Pinquier, Julie Mauclair, St{\'e}phane Rabant

Dans cette {\'e}tude, nous explorons la pr{\'e}diction a priori de la qualit{\'e} de la transcription automatique de la parole dans le cas de la parole r{\'e}verb{\'e}r{\'e}e enregistr{\'e}e avec un seul microphone.

Subjective Evaluation of Comprehensibility in Movie Interactions

no code implementations LREC 2020 R, Estelle ria, Lionel Fontan, Maxime Le Coz, Isabelle Ferran{\'e}, Julien Pinquier

Various research works have dealt with the comprehensibility of textual, audio, or audiovisual documents, and showed that factors related to text (e. g. linguistic complexity), sound (e. g. speech intelligibility), image (e. g. presence of visual context), or even to cognition and emotion can play a major role in the ability of humans to understand the semantic and pragmatic contents of a given document.

Deep Neural Networks for Automatic Speech Processing: A Survey from Large Corpora to Limited Data

no code implementations9 Mar 2020 Vincent Roger, Jérôme Farinas, Julien Pinquier

In that sense we propose an overview of few-shot techniques and perspectives of using such techniques for the focused speech problems in this survey.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Cannot find the paper you are looking for? You can Submit a new open access paper.