1 code implementation • 27 May 2025 • Peter Robicheaux, Matvei Popov, Anish Madan, Isaac Robinson, Joseph Nelson, Deva Ramanan, Neehar Peri
Vision-language models (VLMs) trained on internet-scale data achieve remarkable zero-shot detection performance on common objects like car, truck, and pedestrian.
no code implementations • 19 Jun 2024 • Nathaniel Chodosh, Anish Madan, Simon Lucey, Deva Ramanan
We take a holistic perspective and optimize a compositional model of a dynamic scene that decomposes the world into rigidly-moving objects and the background.
no code implementations • 9 May 2024 • Yash Khandelwal, Mayur Arvind, Sriram Kumar, Ashish Gupta, Sachin Kumar Danisetty, Piyush Bagad, Anish Madan, Mayank Lunayach, Aditya Annavajjala, Abhishek Maiti, Sansiddh Jain, Aman Dalmia, Namrata Deka, Jerome White, Jigar Doshi, Angjoo Kanazawa, Rahul Panicker, Alpan Raval, Srinivas Rana, Makarand Tapaswi
Our goal is to equip health workers and public health systems with a solution for contactless newborn anthropometry in the community.
1 code implementation • 22 Dec 2023 • Anish Madan, Neehar Peri, Shu Kong, Deva Ramanan
Concretely, we propose Foundational FSOD, a new benchmark protocol that evaluates detectors pre-trained on any external data and fine-tuned on multi-modal (text and visual) K-shot examples per target class.
1 code implementation • 1 Jan 2021 • Anish Madan, Ranjitha Prasad
We demonstrate the performance of B-MAML using classification and regression tasks, and highlight that training a sparsifying BNN using MAML indeed improves the parameter footprint of the model while performing at par or even outperforming the MAML approach.
1 code implementation • 18 Jun 2020 • Lokender Tiwari, Anish Madan, Saket Anand, Subhashis Banerjee
Specifically, we devise an ensemble of these generative classifiers that rank-aggregates their predictions via a Borda count-based consensus.
1 code implementation • 15 May 2020 • Shagun Uppal, Anish Madan, Sarthak Bhagat, Yi Yu, Rajiv Ratn Shah
In this paper, we try to exploit the different visual cues and concepts in an image to generate questions using a variational autoencoder (VAE) without ground-truth answers.