Finally, we propose few-shot DML as an efficient way to consistently improve generalization in response to unknown test shifts presented in ooDML.
Given a static image of an object and a local poking of a pixel, the approach then predicts how the object would deform over time.
Video understanding calls for a model to learn the characteristic interplay between static scene content and its dynamics: Given an image, the model must be able to predict a future progression of the portrayed scene and, conversely, a video should be explained in terms of its static image content and all the remaining characteristics not present in the initial frame.
Using this representation, we are able to change the behavior of a person depicted in an arbitrary posture, or to even directly transfer behavior observed in a given video sequence.
Deep Metric Learning (DML) provides a crucial tool for visual similarity and zero-shot applications by learning generalizing embedding spaces, although recent work in DML has shown strong performance saturation across training objectives.
Ranked #6 on Metric Learning on CARS196 (using extra training data)
Visual Similarity plays an important role in many computer vision applications.
Ranked #9 on Metric Learning on CUB-200-2011 (using extra training data)
The common paradigm is discriminative metric learning, which seeks an embedding that separates different training classes.
Learning visual similarity requires to learn relations, typically between triplets of images.
Ranked #13 on Metric Learning on CUB-200-2011 (using extra training data)
Deep Metric Learning (DML) is arguably one of the most influential lines of research for learning visual similarities with many proposed approaches every year.
To nevertheless find those relations which can be reliably utilized for learning, we follow a divide-and-conquer strategy: We find reliable similarities by extracting compact groups of images and reliable dissimilarities by partitioning these groups into subsets, converting the complicated overall problem into few reliable local subproblems.
Without any manual annotation, the model learns a structured representation of postures and their temporal development.