28 papers with code • 1 benchmarks • 1 datasets
Measures the ability of models to uncover an underlying concept that unites several ostensibly disparate entities, which hopefully would not co-occur frequently. This provides a limited test of a model's ability to creatively construct the necessary abstraction to make sense of a situation that it cannot have memorized in training.
In this context, the goal of our work is to devise a few-shot visual learning system that during test time it will be able to efficiently learn novel categories from only a few training data while at the same time it will not forget the initial categories on which it was trained (here called base categories).
We address the problem of class incremental learning, which is a core step towards achieving adaptive vision intelligence.
In particular, we propose a transposed weight sharing scheme, which not only improves performance on image captioning, but also makes the model more suitable for the novel concept learning task.
Current deep caption models can only describe objects contained in paired image-sentence corpora, despite the fact that they are pre-trained with large object recognition datasets, namely ImageNet.
We hypothesize that this setting is ill-suited for real-world applications where unseen objects appear only as a part of a complex scene, warranting both the `recognition' and `localization' of an unseen category.
In semantic space, we search for related concepts, which are then projected back into the image feature spaces by the decoder portion of the TriNet.
It is known that the Langevin dynamics used in MCMC is the gradient flow of the KL divergence on the Wasserstein space, which helps convergence analysis and inspires recent particle-based variational inference methods (ParVIs).