no code implementations • 1 Jun 2023 • Juan F. Montesinos, Daniel Michelsanti, Gloria Haro, Zheng-Hua Tan, Jesper Jensen
Audio and visual modalities are inherently connected in speech signals: lip movements and facial expressions are correlated with speech sounds.
1 code implementation • 22 Nov 2022 • Alejandro Cartas, Coloma Ballester, Gloria Haro
Action spotting in soccer videos is the task of identifying the specific time when a certain key action of the game occurs.
1 code implementation • 5 Apr 2022 • Venkatesh S. Kadandale, Juan F. Montesinos, Gloria Haro
Finally, we use the frozen visual features learned by our lip synchronisation model in the singing voice separation task to outperform a baseline audio-visual model which was trained end-to-end.
1 code implementation • 8 Mar 2022 • Juan F. Montesinos, Venkatesh S. Kadandale, Gloria Haro
In a second stage, the predominant voice is enhanced with an audio-only network.
no code implementations • 1 Jun 2021 • Adrià Arbués-Sangüesa, Adrián Martín, Paulino Granero, Coloma Ballester, Gloria Haro
Orientation is a crucial skill for football players that becomes a differential factor in a large set of events, especially the ones involving passes.
2 code implementations • 20 Apr 2021 • Juan F. Montesinos, Venkatesh S. Kadandale, Gloria Haro
The task of isolating a target singing voice in music videos has useful applications.
1 code implementation • 20 Nov 2020 • Samuel Hurault, Coloma Ballester, Gloria Haro
In a soccer game, the information provided by detecting and tracking brings crucial clues to further analyze and understand some tactical aspects of the game, including individual and team actions.
1 code implementation • 14 Jun 2020 • Juan F. Montesinos, Olga Slizovskaia, Gloria Haro
In this paper, we present a new dataset of music performance videos which can be used for training machine learning methods for multiple tasks such as audio-visual blind source separation and localization, cross-modal correspondences, cross-modal generation and, in general, any audio-visual selfsupervised task.
Audio Source Separation
Audio-Visual Synchronization
+1
Audio and Speech Processing
Databases
Sound
no code implementations • 15 Apr 2020 • Adrià Arbués-Sangüesa, Adrián Martín, Javier Fernández, Coloma Ballester, Gloria Haro
Given a monocular video of a soccer match, this paper presents a computational model to estimate the most feasible pass at any given time.
1 code implementation • 8 Apr 2020 • Olga Slizovskaia, Gloria Haro, Emilia Gómez
In music source separation, the number of sources may vary for each piece and some of the sources may belong to the same family of instruments, thus sharing timbral characteristics and making the sources more correlated.
Sound Audio and Speech Processing
no code implementations • 6 Apr 2020 • Daniel Michelsanti, Olga Slizovskaia, Gloria Haro, Emilia Gómez, Zheng-Hua Tan, Jesper Jensen
Both acoustic and visual information influence human perception of speech.
2 code implementations • 23 Mar 2020 • Venkatesh S. Kadandale, Juan F. Montesinos, Gloria Haro, Emilia Gómez
However, Conditioned U-Net (C-U-Net) uses a control mechanism to train a single model for multi-source separation and attempts to achieve a performance comparable to that of the dedicated models.
no code implementations • 2 Mar 2020 • Adrià Arbués-Sangüesa, Adrián Martín, Javier Fernández, Carlos Rodríguez, Gloria Haro, Coloma Ballester
Although orientation has proven to be a key skill of soccer players in order to succeed in a broad spectrum of plays, body orientation is a yet-little-explored area in sports analytics' research.
no code implementations • 10 Jul 2019 • Adrià Arbués-Sangüesa, Gloria Haro, Coloma Ballester
The presented system could be used as a source of data gathering in order to extract useful statistics and semantic analyses a posteriori.
no code implementations • 3 Jul 2019 • Olga Slizovskaia, Emilia Gómez, Gloria Haro
We also propose a technique for measuring the similarity between activation maps and audio features which typically presented in the form of a matrix, such as chromagrams or spectrograms.
no code implementations • 5 Jun 2019 • Adrià Arbués-Sangüesa, Coloma Ballester, Gloria Haro
Tracking sports players is a widely challenging scenario, specially in single-feed videos recorded in tight courts, where cluttering and occlusions cannot be avoided.
no code implementations • 5 Nov 2018 • Olga Slizovskaia, Leo Kim, Gloria Haro, Emilia Gomez
Can we perform an end-to-end music source separation with a variable number of sources using a deep learning model?
1 code implementation • 29 Feb 2016 • Roberto P. Palomares, Enric Meinhardt-Llopis, Coloma Ballester, Gloria Haro
We propose a large displacement optical flow method that introduces a new strategy to compute a good local minimum of any optical flow energy functional.
no code implementations • 26 Nov 2015 • Maria Oliver, Gloria Haro, Mariella Dimiccoli, Baptiste Mazin, Coloma Ballester
This paper presents a computational model to recover the most likely interpretation of the 3D scene structure from a planar image, where some objects may occlude others.