The goal of Metric Learning is to learn a representation function that maps objects into an embedded space. The distance in the embedded space should preserve the objects’ similarity — similar objects get close and dissimilar objects get far away. Various loss functions have been developed for Metric Learning. For example, the contrastive loss guides the objects from the same class to be mapped to the same point and those from different classes to be mapped to different points whose distances are larger than a margin. Triplet loss is also popular, which requires the distance between the anchor sample and the positive sample to be smaller than the distance between the anchor sample and the negative sample.
|TREND||DATASET||BEST METHOD||PAPER TITLE||PAPER||CODE||COMPARE|
Weakly supervised segmentation requires assigning a label to every pixel based on training instances with partial annotations such as image-level tags, object bounding boxes, labeled points and scribbles.
We also identify dominating modality problem when training a multimodal descriptor.
Then, each mini batch is composed of a randomly selected class and its nearest neighboring classes so as to provide informative and challenging examples for learning.
The existence of noisy labels in real-world data negatively impacts the performance of deep learning models.
One of the main purposes of deep metric learning is to construct an embedding space that has well-generalized embeddings on both seen (training) classes and unseen (test) classes.
Metric learning has been proposed to capture user-item interactions from implicit feedback, but existing methods only represent users and items in a single metric space, ignoring the fact that users can have multiple preferences and items can have multiple properties, which leads to potential conflicts limiting their performance in recommendation.
%We argue that such flexibility is also important for deep metric learning, because different visual concepts indeed correspond to different semantic scales.
We also design experiments on a variety of synthetic datasets and new graph-augmented datasets of proteins and scientific abstracts.
In this paper, we go Back to the Feature: we argue that deep networks should focus on learning robust and invariant visual features, while the geometric estimation should be left to principled algorithms.