1 code implementation • CVPR 2024 • Jiayun Luo, Siddhesh Khandelwal, Leonid Sigal, Boyang Li
From image-text pairs, large-scale vision-language models (VLMs) learn to implicitly associate image regions with words, which prove effective for tasks like visual question answering.
no code implementations • 14 Feb 2023 • Siddhesh Khandelwal, Anirudth Nambirajan, Behjat Siddiquie, Jayan Eledath, Leonid Sigal
Methods for object detection and segmentation often require abundant instance-level annotations for training, which are time-consuming and expensive to collect.
1 code implementation • Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 2021 • Benjamin Wilson, William Qi, Tanmay Agarwal, John Lambert, Jagjeet Singh, Siddhesh Khandelwal, Bowen Pan, Ratnesh Kumar, Andrew Hartnett, Jhony Kaesemodel Pontes, Deva Ramanan, Peter Carr, James Hays
Models are tasked with the prediction of future motion for "scored actors" in each scenario and are provided with track histories that capture object location, heading, velocity, and category.
no code implementations • 27 Jul 2022 • Siddhesh Khandelwal, Leonid Sigal
In this work, we propose a novel framework for scene graph generation that addresses this limitation, as well as introduces dynamic conditioning on the image, using message passing in a Markov Random Field.
no code implementations • ICCV 2021 • Siddhesh Khandelwal, Mohammed Suhail, Leonid Sigal
Our framework is agnostic to the underlying scene graph generation method and address the lack of segmentation annotations in target scene graph datasets (e. g., Visual Genome) through transfer and multi-task learning from, and with, an auxiliary dataset (e. g., MS COCO).
1 code implementation • 24 Aug 2020 • Siddhesh Khandelwal, William Qi, Jagjeet Singh, Andrew Hartnett, Deva Ramanan
Forecasting the long-term future motion of road actors is a core challenge to the deployment of safe autonomous vehicles (AVs).
no code implementations • CVPR 2021 • Siddhesh Khandelwal, Raghav Goyal, Leonid Sigal
Weakly-supervised approaches draw on image-level labels to build detectors/segmentors, while zero/few-shot methods assume abundant instance-level data for a set of base classes, and none to a few examples for novel classes.
no code implementations • ICCV 2019 • Siddhesh Khandelwal, Leonid Sigal
Visual attention mechanisms have proven to be integrally important constituent components of many modern deep neural architectures.
5 code implementations • 19 Apr 2018 • Sharmistha Jat, Siddhesh Khandelwal, Partha Talukdar
Relation extraction is the problem of classifying the relationship between two entities in a given sentence.
1 code implementation • 17 Jan 2017 • Siddhesh Khandelwal, Amit Awekar
We propose a fast heuristic to overcome this bottleneck with only marginal increase in MSE.