no code implementations • 14 Mar 2024 • Hmrishav Bandyopadhyay, Ayan Kumar Bhunia, Pinaki Nath Chowdhury, Aneeshan Sain, Tao Xiang, Timothy Hospedales, Yi-Zhe Song
(ii) SketchINR's auto-decoder provides a much higher-fidelity representation than other learned vector sketch representations, and is uniquely able to scale to complex vector sketches such as FS-COCO.
no code implementations • 14 Mar 2024 • Hmrishav Bandyopadhyay, Pinaki Nath Chowdhury, Ayan Kumar Bhunia, Aneeshan Sain, Tao Xiang, Yi-Zhe Song
In this paper, we explore the unique modality of sketch for explainability, emphasising the profound impact of human strokes compared to conventional pixel-oriented studies.
1 code implementation • 12 Mar 2024 • Subhadeep Koley, Ayan Kumar Bhunia, Deeptanshu Sekhri, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, Yi-Zhe Song
This paper unravels the potential of sketches for diffusion models, addressing the deceptive promise of direct sketch control in generative AI.
no code implementations • 12 Mar 2024 • Subhadeep Koley, Ayan Kumar Bhunia, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, Yi-Zhe Song
This paper, for the first time, explores text-to-image diffusion models for Zero-Shot Sketch-based Image Retrieval (ZS-SBIR).
no code implementations • 12 Mar 2024 • Subhadeep Koley, Ayan Kumar Bhunia, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, Yi-Zhe Song
Two primary input modalities prevail in image retrieval: sketch and text.
no code implementations • 11 Mar 2024 • Subhadeep Koley, Ayan Kumar Bhunia, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, Yi-Zhe Song
@q loss to inject that understanding into the system.
no code implementations • 7 Dec 2023 • Dar-Yen Chen, Ayan Kumar Bhunia, Subhadeep Koley, Aneeshan Sain, Pinaki Nath Chowdhury, Yi-Zhe Song
In this paper, we democratise caricature generation, empowering individuals to effortlessly craft personalised caricatures with just a photo and a conceptual sketch.
no code implementations • 7 Dec 2023 • Hmrishav Bandyopadhyay, Subhadeep Koley, Ayan Das, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, Ayan Kumar Bhunia, Yi-Zhe Song
In this paper, we democratise 3D content creation, enabling precise generation of 3D shapes from abstract sketches while overcoming limitations tied to drawing skills.
no code implementations • CVPR 2023 • Pinaki Nath Chowdhury, Ayan Kumar Bhunia, Aneeshan Sain, Subhadeep Koley, Tao Xiang, Yi-Zhe Song
In particular, we first perform independent prompting on both sketch and photo branches of an SBIR model to build highly generalisable sketch and photo encoders on the back of the generalisation ability of CLIP.
no code implementations • CVPR 2023 • Aneeshan Sain, Ayan Kumar Bhunia, Subhadeep Koley, Pinaki Nath Chowdhury, Soumitri Chattopadhyay, Tao Xiang, Yi-Zhe Song
This paper advances the fine-grained sketch-based image retrieval (FG-SBIR) literature by putting forward a strong baseline that overshoots prior state-of-the-arts by ~11%.
no code implementations • CVPR 2023 • Aneeshan Sain, Ayan Kumar Bhunia, Pinaki Nath Chowdhury, Subhadeep Koley, Tao Xiang, Yi-Zhe Song
At the very core of our solution is a prompt learning setup.
no code implementations • CVPR 2023 • Ayan Kumar Bhunia, Subhadeep Koley, Amandeep Kumar, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, Yi-Zhe Song
Human sketch has already proved its worth in various visual understanding tasks (e. g., retrieval, segmentation, image-captioning, etc).
no code implementations • CVPR 2023 • Subhadeep Koley, Ayan Kumar Bhunia, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, Yi-Zhe Song
We further introduce specific designs to tackle the abstract nature of human sketches, including a fine-grained discriminative loss on the back of a trained sketch-photo retrieval model, and a partial-aware sketch augmentation strategy.
1 code implementation • CVPR 2023 • Abhra Chaudhuri, Ayan Kumar Bhunia, Yi-Zhe Song, Anjan Dutta
For the first time, we identify that for data-scarce tasks like Sketch-Based Image Retrieval (SBIR), where the difficulty in acquiring paired photos and hand-drawn sketches limits data-dependent cross-modal learning algorithms, DFL can prove to be a much more practical paradigm.
no code implementations • ICCV 2023 • Pinaki Nath Chowdhury, Ayan Kumar Bhunia, Aneeshan Sain, Subhadeep Koley, Tao Xiang, Yi-Zhe Song
We perform pivoting on two existing datasets, each from a distant research domain to the other: 2D sketch and photo pairs from the sketch-based image retrieval field (SBIR), and 3D shapes from ShapeNet.
no code implementations • 27 Oct 2022 • Ayan Kumar Bhunia
Sketches have been used to conceptualise and depict visual objects from pre-historic times.
1 code implementation • 4 Jul 2022 • Ayan Kumar Bhunia, Aneeshan Sain, Parth Shah, Animesh Gupta, Pinaki Nath Chowdhury, Tao Xiang, Yi-Zhe Song
To solve this new problem, we introduce a novel model-agnostic meta-learning (MAML) based framework with several key modifications: (1) As a retrieval task with a margin-based contrastive loss, we simplify the MAML training in the inner loop to make it more stable and tractable.
no code implementations • CVPR 2023 • Pinaki Nath Chowdhury, Ayan Kumar Bhunia, Aneeshan Sain, Subhadeep Koley, Tao Xiang, Yi-Zhe Song
In this paper, we extend scene understanding to include that of human sketch.
no code implementations • CVPR 2022 • Ayan Kumar Bhunia, Viswanatha Reddy Gajjala, Subhadeep Koley, Rohit Kundu, Aneeshan Sain, Tao Xiang, Yi-Zhe Song
In this paper, we push the boundary further for FSCIL by addressing two key questions that bottleneck its ubiquitous application (i) can the model learn from diverse modalities other than just photo (as humans do), and (ii) what if photos are not readily accessible (due to ethical and privacy constraints).
no code implementations • CVPR 2022 • Aneeshan Sain, Ayan Kumar Bhunia, Vaishnav Potlapalli, Pinaki Nath Chowdhury, Tao Xiang, Yi-Zhe Song
In this paper, we question to argue that this setup by definition is not compatible with the inherent abstract and subjective nature of sketches, i. e., the model might transfer well to new categories, but will not understand sketches existing in different test-time distribution as a result.
no code implementations • CVPR 2022 • Pinaki Nath Chowdhury, Ayan Kumar Bhunia, Viswanatha Reddy Gajjala, Aneeshan Sain, Tao Xiang, Yi-Zhe Song
We scrutinise an important observation plaguing scene-level sketch research -- that a significant portion of scene sketches are "partial".
1 code implementation • CVPR 2022 • Ayan Kumar Bhunia, Subhadeep Koley, Abdullah Faiz Ur Rahman Khilji, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, Yi-Zhe Song
We first conducted a pilot study that revealed the secret lies in the existence of noisy strokes, but not so much of the "I can't sketch".
1 code implementation • 4 Mar 2022 • Pinaki Nath Chowdhury, Aneeshan Sain, Ayan Kumar Bhunia, Tao Xiang, Yulia Gryaditskaya, Yi-Zhe Song
We advance sketch research to scenes with the first dataset of freehand scene sketches, FS-COCO.
no code implementations • ICCV 2021 • Ayan Kumar Bhunia, Aneeshan Sain, Pinaki Nath Chowdhury, Yi-Zhe Song
In this paper, for the first time, we argue for their unification -- we aim for a single model that can compete favourably with two separate state-of-the-art STR and HTR models.
no code implementations • ICCV 2021 • Ayan Kumar Bhunia, Aneeshan Sain, Amandeep Kumar, Shuvozit Ghose, Pinaki Nath Chowdhury, Yi-Zhe Song
In this paper, we argue that semantic information offers a complementary role in addition to visual only.
no code implementations • ICCV 2021 • Ayan Kumar Bhunia, Pinaki Nath Chowdhury, Aneeshan Sain, Yi-Zhe Song
Our framework is iterative in nature, in that it utilises predicted knowledge of character sequences from a previous iteration, to augment the main network in improving the next prediction.
1 code implementation • CVPR 2021 • Ayan Kumar Bhunia, Shuvozit Ghose, Amandeep Kumar, Pinaki Nath Chowdhury, Aneeshan Sain, Yi-Zhe Song
In this paper, we take a completely different perspective -- we work on the assumption that there is always a new style that is drastically different, and that we will only have very limited data during testing to perform adaptation.
no code implementations • CVPR 2021 • Aneeshan Sain, Ayan Kumar Bhunia, Yongxin Yang, Tao Xiang, Yi-Zhe Song
With this meta-learning framework, our model can not only disentangle the cross-modal shared semantic content for SBIR, but can adapt the disentanglement to any unseen user style as well, making the SBIR model truly style-agnostic.
1 code implementation • CVPR 2021 • Ayan Kumar Bhunia, Pinaki Nath Chowdhury, Yongxin Yang, Timothy M. Hospedales, Tao Xiang, Yi-Zhe Song
This data is uniquely characterised by its existence in dual modalities of rasterized images and vector coordinate sequences.
1 code implementation • CVPR 2021 • Ayan Kumar Bhunia, Pinaki Nath Chowdhury, Aneeshan Sain, Yongxin Yang, Tao Xiang, Yi-Zhe Song
A fundamental challenge faced by existing Fine-Grained Sketch-Based Image Retrieval (FG-SBIR) models is the data scarcity -- model performances are largely bottlenecked by the lack of sketch-photo pairs.
1 code implementation • 29 Jul 2020 • Aneeshan Sain, Ayan Kumar Bhunia, Yongxin Yang, Tao Xiang, Yi-Zhe Song
In this paper, we study a further trait of sketches that has been overlooked to date, that is, they are hierarchical in terms of the levels of detail -- a person typically sketches up to various extents of detail to depict an object.
1 code implementation • CVPR 2020 • Ayan Kumar Bhunia, Yongxin Yang, Timothy M. Hospedales, Tao Xiang, Yi-Zhe Song
Fine-grained sketch-based image retrieval (FG-SBIR) addresses the problem of retrieving a particular photo instance given a user's query sketch.
Cross-Modal Retrieval On-the-Fly Sketch Based Image Retrieval +1
5 code implementations • ECCV 2020 • Ruoyi Du, Dongliang Chang, Ayan Kumar Bhunia, Jiyang Xie, Zhanyu Ma, Yi-Zhe Song, Jun Guo
In this work, we propose a novel framework for fine-grained visual classification to tackle these problems.
Ranked #17 on Fine-Grained Image Classification on Stanford Cars
1 code implementation • 24 Feb 2020 • Ayan Kumar Bhunia, Yongxin Yang, Timothy M. Hospedales, Tao Xiang, Yi-Zhe Song
Fine-grained sketch-based image retrieval (FG-SBIR) addresses the problem of retrieving a particular photo instance given a user's query sketch.
Cross-Modal Retrieval On-the-Fly Sketch Based Image Retrieval +1
3 code implementations • 11 Feb 2020 • Dongliang Chang, Yifeng Ding, Jiyang Xie, Ayan Kumar Bhunia, Xiaoxu Li, Zhanyu Ma, Ming Wu, Jun Guo, Yi-Zhe Song
The proposed loss function, termed as mutual-channel loss (MC-Loss), consists of two channel-specific components: a discriminality component and a diversity component.
Ranked #29 on Fine-Grained Image Classification on FGVC Aircraft
no code implementations • 9 Feb 2019 • Sauradip Nag, Ayan Kumar Bhunia, Aishik Konwer, Partha Pratim Roy
Facial micro-expressions are sudden involuntary minute muscle movements which reveal true emotions that people try to conceal.
no code implementations • 4 Nov 2018 • Ayan Kumar Bhunia, Perla Sai Raj Kishore, Pranay Mukherjee, Abhirup Das, Partha Pratim Roy
In the next stage, a second network gathers the multi-scale feature representations from the TSN's intermediate layers using channel-wise attention, combines them in a progressive manner to a dense continuous representation which is finally converted into a binary hash code with the help of individual and pairwise label information.
2 code implementations • 4 Nov 2018 • Ayan Kumar Bhunia, Ankan Kumar Bhunia, Shuvozit Ghose, Abhirup Das, Partha Pratim Roy, Umapada Pal
Logo detection in real-world scene images is an important problem with applications in advertisement and marketing.
no code implementations • CVPR 2019 • Ayan Kumar Bhunia, Abhirup Das, Ankan Kumar Bhunia, Perla Sai Raj Kishore, Partha Pratim Roy
Handwritten Word Recognition and Spotting is a challenging field dealing with handwritten text possessing irregular and complex shapes.
1 code implementation • 1 Nov 2018 • Pranay Mukherjee, Abhirup Das, Ayan Kumar Bhunia, Partha Pratim Roy
Can we ask computers to recognize what we see from brain signals alone?
2 code implementations • 31 Oct 2018 • Perla Sai Raj Kishore, Ayan Kumar Bhunia, Shuvozit Ghose, Partha Pratim Roy
We use Global Context Aggregation (GCA) and a modified Region Proposal Network (RPN) with adaptive convolutions to generate thumbnails in real time.
1 code implementation • 25 Oct 2018 • Ankan Kumar Bhunia, Ayan Kumar Bhunia, Aneeshan Sain, Partha Pratim Roy
By jointly training the two networks we can increase the adversarial robustness of our system.
no code implementations • 23 Feb 2018 • Ayan Kumar Bhunia, Subham Mukherjee, Aneeshan Sain, Ankan Kumar Bhunia, Partha Pratim Roy, Umapada Pal
In this paper, we propose a novel approach of word-level Indic script identification using only character-level data in training stage.
no code implementations • 22 Jan 2018 • Ankan Kumar Bhunia, Ayan Kumar Bhunia, Prithaj Banerjee, Aishik Konwer, Abir Bhowmick, Partha Pratim Roy, Umapada Pal
We employ a novel convolutional recurrent model architecture in the Generator that efficiently deals with the word images of arbitrary width.
no code implementations • 22 Jan 2018 • Aishik Konwer, Ayan Kumar Bhunia, Abir Bhowmick, Ankan Kumar Bhunia, Prithaj Banerjee, Partha Pratim Roy, Umapada Pal
Staff line removal is a crucial pre-processing step in Optical Music Recognition.
no code implementations • 22 Jan 2018 • Ayan Kumar Bhunia, Abir Bhowmick, Ankan Kumar Bhunia, Aishik Konwer, Prithaj Banerjee, Partha Pratim Roy, Umapada Pal
Our encoder module consists of Convolutional LSTM network, which takes an offline character image as the input and encodes the feature sequence to a hidden representation.
no code implementations • 3 Jan 2018 • Ayan Kumar Bhunia, Avirup Bhattacharyya, Prithaj Banerjee, Partha Pratim Roy, Subrahmanyam Murala
In this paper, we have proposed a novel feature descriptors combining color and texture information collectively.
1 code implementation • 1 Jan 2018 • Ankan Kumar Bhunia, Aishik Konwer, Ayan Kumar Bhunia, Abir Bhowmick, Partha P. Roy, Umapada Pal
In this paper, we propose a novel method that involves extraction of local and global features using CNN-LSTM framework and weighting them dynamically for script identification.
no code implementations • 30 Dec 2017 • Shuvozit Ghose, Abhirup Das, Ayan Kumar Bhunia, Partha Pratim Roy
The performance of the method has been tested for image retrieval on four popular databases.
no code implementations • 19 Dec 2017 • Ayan Kumar Bhunia, Partha Pratim Roy, Akash Mohta, Umapada Pal
This paper presents a novel cross language platform for handwritten word recognition and spotting for such low-resource scripts where training is performed with a sufficiently large dataset of an available script (considered as source script) and testing is done on other scripts (considered as target script).
no code implementations • 5 Dec 2017 • Ayan Kumar Bhunia, Partha Pratim Roy, Umapada Pal
Also, we propose a novel feature combining foreground and background information of text line images for keyword-spotting by character filler models.
no code implementations • 7 Sep 2017 • Prithaj Banerjee, Ayan Kumar Bhunia, Avirup Bhattacharyya, Partha Pratim Roy, Subrahmanyam Murala
The proposed method is based on the concept that neighbors of a particular pixel hold a significant amount of texture information that can be considered for efficient texture representation for CBIR.
no code implementations • 18 Aug 2017 • Partha Pratim Roy, Ayan Kumar Bhunia, Avirup Bhattacharyya, Umapada Pal
To evaluate the proposed system for searching keyword from natural scene image and video frames, we have considered two popular Indic scripts such as Bangla (Bengali) and Devanagari along with English.
no code implementations • 1 Aug 2017 • Partha Pratim Roy, Ayan Kumar Bhunia, Ayan Das, Prasenjit Dey, Umapada Pal
To avoid character segmentation in such scripts, HMM-based sequence modeling has been used earlier in holistic way.
no code implementations • 22 Jul 2017 • Aneeshan Sain, Ayan Kumar Bhunia, Partha Pratim Roy, Umapada Pal
Until now only a few methods have been proposed that look into curved text detection in video frames, wherein lies our novelty.
no code implementations • 21 Jul 2017 • Partha Pratim Roy, Ayan Kumar Bhunia, Umapada Pal
We propose a line based date spotting approach using Hidden Markov Model (HMM) which is used to detect the date information in a given text.
no code implementations • 21 Jul 2017 • Partha Pratim Roy, Ayan Kumar Bhunia, Umapada Pal
A novel Factor Analysis based feature selection technique is applied in sliding window features to reduce the noise appearing from staff lines which proves efficiency in writer identification performance. In our framework we have also proposed a novel score line detection approach in musical sheet using HMM.
no code implementations • 21 Jul 2017 • Ayan Kumar Bhunia, Gautam Kumar, Partha Pratim Roy, R. Balasubramanian, Umapada Pal
In this paper, we present a novel approach based on color channel selection for text recognition from scene images and video frames.