1 code implementation • 29 Jul 2024 • Gagan Jain, Nidhi Hegde, Aditya Kusupati, Arsha Nagrani, Shyamal Buch, Prateek Jain, Anurag Arnab, Sujoy Paul
We present Mixture of Nested Experts (MoNE), which utilizes a nested structure for experts, wherein individual experts fall on an increasing compute-accuracy curve.
no code implementations • 18 Jul 2024 • Zahidul Islam, Sujoy Paul, Mrigank Rochan
Subsequently, we combine audio and visual pseudo-highlights to create the audio-visual pseudo ground-truth highlight of each video for training an audio-visual highlight detection network.
no code implementations • 17 Jul 2024 • Rajat Koner, Gagan Jain, Prateek Jain, Volker Tresp, Sujoy Paul
We show LookupViT's effectiveness on multiple domains - (a) for image-classification (ImageNet-1K and ImageNet-21K), (b) video classification (Kinetics400 and Something-Something V2), (c) image captioning (COCO-Captions) with a frozen encoder.
no code implementations • 25 Sep 2023 • Nidhi Hegde, Sujoy Paul, Gagan Madan, Gaurav Aggarwal
Recent document question answering models consist of two key components: the vision encoder, which captures layout and visual elements in images, and a Large Language Model (LLM) that helps contextualize questions to the image and supplements them with external world knowledge to generate accurate answers.
no code implementations • 29 Aug 2023 • Debapriya Tula, Sujoy Paul, Gagan Madan, Peter Garst, Reeve Ingle, Gaurav Aggarwal
While text line recognition models are generally trained on large corpora of real and synthetic data, such models can still make frequent mistakes if the handwriting is inscrutable or the image acquisition process adds corruptions, such as noise, blur, compression, etc.
no code implementations • 12 Jun 2023 • Sujoy Paul, Gagan Madan, Akankshya Mishra, Narayan Hegde, Pradeep Kumar, Gaurav Aggarwal
In this work, we focus on the complex problem of extracting medicine names from handwritten prescriptions using only weakly labeled data.
no code implementations • 21 Jul 2022 • K J Joseph, Sujoy Paul, Gaurav Aggarwal, Soma Biswas, Piyush Rai, Kai Han, Vineeth N Balasubramanian
Inspired by this, we identify and formulate a new, pragmatic problem setting of NCDwF: Novel Class Discovery without Forgetting, which tasks a machine learning model to incrementally discover novel categories of instances from unlabeled data, while maintaining its performance on the previously seen categories.
1 code implementation • 22 Apr 2022 • K J Joseph, Sujoy Paul, Gaurav Aggarwal, Soma Biswas, Piyush Rai, Kai Han, Vineeth N Balasubramanian
Novel Class Discovery (NCD) is a learning paradigm, where a machine learning model is tasked to semantically group instances from unlabeled data, by utilizing labeled instances from a disjoint set of classes.
1 code implementation • 21 Mar 2022 • Mihir Prabhudesai, Anirudh Goyal, Sujoy Paul, Sjoerd van Steenkiste, Mehdi S. M. Sajjadi, Gaurav Aggarwal, Thomas Kipf, Deepak Pathak, Katerina Fragkiadaki
In our work, we find evidence that these losses are insufficient for the task of scene decomposition, without also considering architectural inductive biases.
no code implementations • 4 Dec 2021 • Sujoy Paul, Ansh Khurana, Gaurav Aggarwal
Unsupervised domain adaptation aims to adapt a model learned on the labeled source data, to a new unlabeled target dataset.
no code implementations • 4 Dec 2021 • Ansh Khurana, Sujoy Paul, Piyush Rai, Soma Biswas, Gaurav Aggarwal
In Test-time Adaptation (TTA), given a source model, the goal is to adapt it to make better predictions for test instances from a different distribution than the source.
no code implementations • 31 Jul 2021 • Sayak Nag, Dripta S. Raychaudhuri, Sujoy Paul, Amit K. Roy-Chowdhury
However, it is a critical task in many applications like environmental monitoring, where the number of labeled examples for each class is limited.
no code implementations • 20 May 2021 • Dripta S. Raychaudhuri, Sujoy Paul, Jeroen van Baar, Amit K. Roy-Chowdhury
Once this correspondence is found, we can directly transfer the demonstrations on one domain to the other and use it for imitation.
1 code implementation • CVPR 2021 • Sk Miraj Ahmed, Dripta S. Raychaudhuri, Sujoy Paul, Samet Oymak, Amit K. Roy-Chowdhury
A recent line of work addressed this problem and proposed an algorithm that transfers knowledge to the unlabeled target domain from a single source model without requiring access to the source data.
1 code implementation • 13 Aug 2020 • Akash Gupta, Rameswar Panda, Sujoy Paul, Jianming Zhang, Amit K. Roy-Chowdhury
While machine learning approaches to visual recognition offer great promise, most of the existing methods rely heavily on the availability of large quantities of labeled training data.
no code implementations • ECCV 2020 • Sujoy Paul, Yi-Hsuan Tsai, Samuel Schulter, Amit K. Roy-Chowdhury, Manmohan Chandraker
In this work, we propose a novel framework for domain adaptation in semantic segmentation with image-level weak labels in the target domain.
no code implementations • 21 Jul 2020 • Xueping Wang, Sujoy Paul, Dripta S. Raychaudhuri, Min Liu, Yaonan Wang, Amit K. Roy-Chowdhury, Fellow, IEEE
In order to cope with this issue, we introduce the problem of learning person re-identification models from videos with weak supervision.
Multiple Instance Learning
Video-Based Person Re-Identification
1 code implementation • NeurIPS 2019 • Sujoy Paul, Jeroen van Baar, Amit K. Roy-Chowdhury
Learning to solve complex goal-oriented tasks with sparse terminal-only rewards often requires an enormous number of samples.
no code implementations • 9 Apr 2019 • Mahmudul Hasan, Sujoy Paul, Anastasios I. Mourikis, Amit K. Roy-Chowdhury
We formulate a conditional random field model that encodes the context and devise an information-theoretic approach that utilizes entropy and mutual information of the nodes to compute the set of most informative queries, which are labeled by a human.
1 code implementation • CVPR 2019 • Niluthpol Chowdhury Mithun, Sujoy Paul, Amit K. Roy-Chowdhury
The weak nature of the supervision is because, during training, we only have access to the video-text pairs rather than the temporal extent of the video to which different text descriptions relate.
no code implementations • 28 Nov 2018 • Sujoy Paul, Jeroen van Baar
We show that in spite of not using human-generated trajectories and just using the simulator as a model to generate a limited number of trajectories, we can get a speed-up of about 2-3x in the learning process.
no code implementations • 6 Aug 2018 • Sujoy Paul, Sourya Roy, Amit K. Roy-Chowdhury
This necessitates learning of visual features from videos in an unsupervised setting.
1 code implementation • ECCV 2018 • Sujoy Paul, Sourya Roy, Amit K. Roy-Chowdhury
Most activity localization methods in the literature suffer from the burden of frame-wise annotation requirement.
Ranked #1 on
Action Classification
on ActivityNet-1.2
1 code implementation • 2 Jul 2018 • Shasha Li, Ajaya Neupane, Sujoy Paul, Chengyu Song, Srikanth V. Krishnamurthy, Amit K. Roy Chowdhury, Ananthram Swami
We exploit recent advances in generative adversarial network (GAN) architectures to account for temporal correlations and generate adversarial samples that can cause misclassification rates of over 80% for targeted activities.
no code implementations • CVPR 2018 • Sourya Roy, Sujoy Paul, Neal E. Young, Amit K. Roy-Chowdhury
Minimization of labeling effort for person re-identification in camera networks is an important problem as most of the existing popular methods are supervised and they require large amount of manual annotations, acquiring which is a tedious job.
no code implementations • CVPR 2017 • Jawadul H. Bappy, Sujoy Paul, Ertem Tuncel, Amit K. Roy-Chowdhury
In computer vision, selection of the most informative samples from a huge pool of training data in order to learn a good recognition model is an active research problem.
1 code implementation • Computer Vision and Pattern Recognition (CVPR) 2017 • Sujoy Paul, Jawadul H. Bappy, Amit Roy-Chowdhury
We construct a graph from the unlabeled data to represent the underlying structure, such that each node represents a data point, and edges represent the inter-relationships between them.