|TREND||DATASET||BEST METHOD||PAPER TITLE||PAPER||CODE||COMPARE|
In addition, we show that not only can our model recognize unseen compositions robustly in an open-world setting, it can also generalize to compositions where objects themselves were unseen during training.
Ranked #5 on Image Retrieval with Multi-Modal Query on MIT-States
To model the compositional nature of these general concepts, it is a good choice to learn them through transformations, such as coupling and decoupling.
Ranked #1 on Compositional Zero-Shot Learning on MIT-States
This leads to consistent misclassification of samples from a new distribution, like new combinations of known components.
In compositional zero-shot learning, the goal is to recognize unseen compositions (e. g. old dog) of observed visual primitives states (e. g. old, cute) and objects (e. g. car, dog) in the training set.
After estimating the feasibility score of each composition, we use these scores to either directly mask the output space or as a margin for the cosine similarity between visual features and compositional embeddings during training.