no code implementations • 8 Mar 2025 • Santiago Cuervo, Adel Moumen, Yanis Labrak, Sameer Khurana, Antoine Laurent, Mickael Rouvier, Ricard Marxer
Text-Speech Language Models (TSLMs) -- language models trained to jointly process and generate text and speech -- aim to enable cross-modal knowledge transfer to overcome the scaling limitations of unimodal speech LMs.
no code implementations • 17 Jul 2024 • Joonas Kalda, Tanel Alumäe, Martin Lebourdais, Hervé Bredin, Séverin Baroudi, Ricard Marxer
Our team participated in the speaker diarization and language diarization tracks of the challenge.
no code implementations • 2 Apr 2024 • Paul Best, Santiago Cuervo, Ricard Marxer
Macroscopic intelligibility models predict the expected human word-error-rate for a given speech-in-noise stimulus.
no code implementations • 31 Mar 2024 • Santiago Cuervo, Ricard Marxer
We establish a strong correlation between pre-training loss and downstream syntactic and semantic performance in SLMs and LLMs, which results in predictable scaling of linguistic performance.
1 code implementation • 4 Mar 2024 • Joonas Kalda, Clément Pagés, Ricard Marxer, Tanel Alumäe, Hervé Bredin
A major drawback of supervised speech separation (SSep) systems is their reliance on synthetic data, leading to poor real-world generalization.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • 24 Jan 2024 • Santiago Cuervo, Ricard Marxer
Our method resulted in the winning submission in the CPC2, demonstrating its promise for speech perception applications.
1 code implementation • 9 May 2023 • Clémentin Boittiaux, Claire Dune, Maxime Ferrera, Aurélien Arnaubec, Ricard Marxer, Marjolaine Matabos, Loïc Van Audenhaege, Vincent Hugel
This paper presents a new deep-sea dataset to benchmark underwater long-term visual localization.
1 code implementation • 18 Dec 2022 • Clémentin Boittiaux, Ricard Marxer, Claire Dune, Aurélien Arnaubec, Maxime Ferrera, Vincent Hugel
Underwater images are altered by the physical characteristics of the medium through which light rays pass before reaching the optical sensor.
1 code implementation • 5 Jun 2022 • Santiago Cuervo, Adrian Łańcucki, Ricard Marxer, Paweł Rychlikowski, Jan Chorowski
The success of deep learning comes from its ability to capture the hierarchical structure of data by learning high-level representations defined in terms of low-level ones.
1 code implementation • 4 May 2022 • Clémentin Boittiaux, Ricard Marxer, Claire Dune, Aurélien Arnaubec, Vincent Hugel
This paper focuses on the loss functions that embed the error between two poses to perform deep learning based camera pose regression.
1 code implementation • 29 Oct 2021 • Santiago Cuervo, Maciej Grabias, Jan Chorowski, Grzegorz Ciesielski, Adrian Łańcucki, Paweł Rychlikowski, Ricard Marxer
We investigate the performance on phoneme categorization and phoneme and word segmentation of several self-supervised learning (SSL) methods based on Contrastive Predictive Coding (CPC).
1 code implementation • 22 Jun 2021 • Jan Chorowski, Grzegorz Ciesielski, Jarosław Dzikowski, Adrian Łańcucki, Ricard Marxer, Mateusz Opala, Piotr Pusz, Paweł Rychlikowski, Michał Stypułkowski
We present a number of low-resource approaches to the tasks of the Zero Resource Speech Challenge 2021.
1 code implementation • 24 Apr 2021 • Jan Chorowski, Grzegorz Ciesielski, Jarosław Dzikowski, Adrian Łańcucki, Ricard Marxer, Mateusz Opala, Piotr Pusz, Paweł Rychlikowski, Michał Stypułkowski
We investigate the possibility of forcing a self-supervised model trained using a contrastive predictive loss to extract slowly varying latent representations.
no code implementations • 3 Jun 2020 • Sameer Khurana, Antoine Laurent, Wei-Ning Hsu, Jan Chorowski, Adrian Lancucki, Ricard Marxer, James Glass
Probabilistic Latent Variable Models (LVMs) provide an alternative to self-supervised learning approaches for linguistic representation learning from speech.
1 code implementation • 18 May 2020 • Adrian Łańcucki, Jan Chorowski, Guillaume Sanchez, Ricard Marxer, Nanxin Chen, Hans J. G. A. Dolfing, Sameer Khurana, Tanel Alumäe, Antoine Laurent
We show that the codebook learning can suffer from poor initialization and non-stationarity of clustered encoder outputs.
no code implementations • 23 Apr 2020 • Guillaume Sanchez, Vincente Guis, Ricard Marxer, Frédéric Bouchara
Deep Learning systems have shown tremendous accuracy in image classification, at the cost of big image datasets.
no code implementations • 31 Jul 2018 • Mandar Gogate, Ahsan Adeel, Ricard Marxer, Jon Barker, Amir Hussain
The process of selective attention in the brain is known to contextually exploit the available audio and visual cues to better focus on target speaker while filtering out other noises.
no code implementations • 2 Feb 2015 • Ricard Marxer, Hendrik Purwins
A system is presented that segments, clusters and predicts musical audio in an unsupervised manner, adjusting the number of (timbre) clusters instantaneously to the audio input.