It's Time for Artistic Correspondence in Music and Video

no code implementations CVPR 2022 Didac Suris, Carl Vondrick, Bryan Russell, Justin Salamon

In order to capture the high-level concepts that are required to solve the task, we propose modeling the long-term temporal context of both the video and the music signals, using Transformer networks for each modality.


Revealing Occlusions with 4D Neural Fields

no code implementations CVPR 2022 Basile Van Hoorick, Purva Tendulka, Didac Suris, Dennis Park, Simon Stent, Carl Vondrick

For computer vision systems to operate in dynamic situations, they need to be able to represent and reason about object permanence.

Learning Words by Drawing Images

no code implementations CVPR 2019 Didac Suris, Adria Recasens, David Bau, David Harwath, James Glass, Antonio Torralba

Our goal is to learn the correspondence between spoken words and abstract visual attributes, from a dataset of spoken descriptions of images.


