1 code implementation • CVPR 2021 • Michael Dorkenwald, Timo Milbich, Andreas Blattmann, Robin Rombach, Konstantinos G. Derpanis, Björn Ommer
Video understanding calls for a model to learn the characteristic interplay between static scene content and its dynamics: Given an image, the model must be able to predict a future progression of the portrayed scene and, conversely, a video should be explained in terms of its static image content and all the remaining characteristics not present in the initial frame.
1 code implementation • CVPR 2021 • Andreas Blattmann, Timo Milbich, Michael Dorkenwald, Björn Ommer
Given a static image of an object and a local poking of a pixel, the approach then predicts how the object would deform over time.
2 code implementations • ICCV 2021 • Andreas Blattmann, Timo Milbich, Michael Dorkenwald, Björn Ommer
There will be distinctive movement, despite evident variations caused by the stochastic nature of our world.
1 code implementation • CVPR 2021 • Andreas Blattmann, Timo Milbich, Michael Dorkenwald, Björn Ommer
Using this representation, we are able to change the behavior of a person depicted in an arbitrary posture, or to even directly transfer behavior observed in a given video sequence.
no code implementations • CVPR 2020 • Michael Dorkenwald, Uta Buchler, Bjorn Ommer
We present an approach to unsupervised magnification of posture differences across individuals despite large deviations in appearance.
no code implementations • 16 Dec 2020 • Biagio Brattoli, Uta Buechler, Michael Dorkenwald, Philipp Reiser, Linard Filli, Fritjof Helmchen, Anna-Sophia Wahl, Bjoern Ommer
A central aspect is unsupervised learning of posture and behaviour representations to enable an objective comparison of movement.
no code implementations • 24 May 2022 • Michael Dorkenwald, Fanyi Xiao, Biagio Brattoli, Joseph Tighe, Davide Modolo
We propose SCVRL, a novel contrastive-based framework for self-supervised learning for videos.
no code implementations • 13 Feb 2024 • Michael Dorkenwald, Nimrod Barazani, Cees G. M. Snoek, Yuki M. Asano
Vision-Language Models (VLMs), such as Flamingo and GPT-4V, have shown immense potential by integrating large language models with vision systems.