no code implementations • 8 Apr 2024 • Sai Bhargav Rongali, Sarthak Mehrotra, Ankit Jha, Mohamad Hassan N C, Shirsha Bose, Tanisha Gupta, Mainak Singha, Biplab Banerjee
In Generalized Category Discovery (GCD), we cluster unlabeled samples of known and novel classes, leveraging a training dataset of known classes.
no code implementations • 31 Mar 2024 • Mainak Singha, Ankit Jha, Shirsha Bose, Ashwin Nair, Moloud Abdar, Biplab Banerjee
Central to our approach is modeling a unique prompt tailored for detecting unknown class samples, and to train this, we employ a readily accessible stable diffusion model, elegantly generating proxy images for the open class.
no code implementations • 27 Nov 2023 • Avigyan Bhattacharya, Mainak Singha, Ankit Jha, Biplab Banerjee
To this end, we introduce C-SAW, a method that complements CLIP with a self-supervised loss in the visual space and a novel prompt learning technique that emphasizes both visual domain and content-specific features.
no code implementations • 23 Sep 2023 • Ankit Jha, Debabrata Pal, Mainak Singha, Naman Agarwal, Biplab Banerjee
Even though joint training of audio-visual modalities improves classification performance in a low-data regime, it has yet to be thoroughly investigated in the RS domain.
1 code implementation • 22 Aug 2023 • Mainak Singha, Ankit Jha, Biplab Banerjee
GOPro is trained end-to-end on all three loss objectives, combining the strengths of CLIP and SSL in a principled manner.
1 code implementation • 10 Aug 2023 • Mainak Singha, Harsh Pal, Ankit Jha, Biplab Banerjee
We leverage the frozen vision backbone of CLIP to extract both image style (domain) and content information, which we apply to learn prompt tokens.
1 code implementation • 12 Apr 2023 • Mainak Singha, Ankit Jha, Bhupendra Solanki, Shirsha Bose, Biplab Banerjee
APPLeNet emphasizes the importance of multi-scale feature learning in RS scene classification and disentangles visual style and content primitives for domain generalization tasks.
no code implementations • 18 Feb 2023 • Shirsha Bose, Ankit Jha, Enrico Fini, Mainak Singha, Elisa Ricci, Biplab Banerjee
Our method focuses on a domain-agnostic prompt learning strategy, aiming to disentangle the visual style and content information embedded in CLIP's pre-trained vision encoder, enabling effortless adaptation to novel domains during inference.