2 code implementations • ICCV 2021 • Kamal Gupta, Justin Lazarow, Alessandro Achille, Larry Davis, Vijay Mahadevan, Abhinav Shrivastava
Generating a new layout or extending an existing layout requires understanding the relationships between these primitives.
1 code implementation • CVPR 2023 • Jiang Liu, Hui Ding, Zhaowei Cai, Yuting Zhang, Ravi Kumar Satzoda, Vijay Mahadevan, R. Manmatha
In this work, instead of directly predicting the pixel-level segmentation masks, the problem of referring image segmentation is formulated as sequential polygon generation, and the predicted polygons can be later converted into segmentation masks.
Ranked #1 on Referring Expression Segmentation on ReferIt (using extra training data)
no code implementations • NeurIPS 2012 • Vijay Mahadevan, Nuno Vasconcelos
A model connecting visual tracking and saliency has recently been proposed.
no code implementations • NeurIPS 2011 • Vijay Mahadevan, Chi W. Wong, Jose C. Pereira, Tom Liu, Nuno Vasconcelos, Lawrence K. Saul
To perform this visualization, we augment MCU with an additional step for metric learning in the high dimensional voxel space.
no code implementations • NeurIPS 2007 • Dashan Gao, Vijay Mahadevan, Nuno Vasconcelos
The classical hypothesis, that bottom-up saliency is a center-surround process, is combined with a more recent hypothesis that all saliency decisions are optimal in a decision-theoretic sense.
no code implementations • CVPR 2014 • Song Lu, Vijay Mahadevan, Nuno Vasconcelos
The propagation of the resulting saliency seeds, using a diffusion process, is finally shown to outperform the state of the art on a number of salient object detection datasets.
no code implementations • CVPR 2016 • Yingwei Li, Weixin Li, Vijay Mahadevan, Nuno Vasconcelos
To account for long-range inhomogeneous dynamics, a VLAD descriptor is derived for the LDS and pooled over the whole video, to arrive at the final VLAD^3 representation.
no code implementations • 2 Aug 2019 • Cuong V. Nguyen, Alessandro Achille, Michael Lam, Tal Hassner, Vijay Mahadevan, Stefano Soatto
As an application, we apply our procedure to study two properties of a task sequence: (1) total complexity and (2) sequential heterogeneity.
no code implementations • 1 Jan 2021 • Kamal Gupta, Vijay Mahadevan, Alessandro Achille, Justin Lazarow, Larry S. Davis, Abhinav Shrivastava
We address the problem of scene layout generation for diverse domains such as images, mobile applications, documents and 3D objects.
no code implementations • ICCV 2021 • Qi Dong, Zhuowen Tu, Haofu Liao, Yuting Zhang, Vijay Mahadevan, Stefano Soatto
Computer vision applications such as visual relationship detection and human object interaction can be formulated as a composite (structured) set detection problem in which both the parts (subject, object, and predicate) and the sum (triplet as a whole) are to be detected in a hierarchical fashion.
no code implementations • ICCV 2021 • Sagnik Das, Kunwar Yashraj Singh, Jon Wu, Erhan Bas, Vijay Mahadevan, Rahul Bhotika, Dimitris Samaras
Document unwarping attempts to undo the physical deformation of the paper and recover a 'flatbed' scanned document-image for downstream tasks such as OCR.
no code implementations • 6 Jan 2022 • Pengkai Zhu, Zhaowei Cai, Yuanjun Xiong, Zhuowen Tu, Luis Goncalves, Vijay Mahadevan, Stefano Soatto
We present Contrastive Neighborhood Alignment (CNA), a manifold learning approach to maintain the topology of learned features whereby data points that are mapped to nearby representations by the source (teacher) model are also mapped to neighbors by the target (student) model.
no code implementations • 23 Mar 2022 • David Lenz, Raine Yeh, Vijay Mahadevan, Iulian Grindeanu, Tom Peterka
B-spline models are a powerful way to represent scientific data sets with a functional approximation.
no code implementations • 30 Mar 2022 • Simone Bombari, Alessandro Achille, Zijian Wang, Yu-Xiang Wang, Yusheng Xie, Kunwar Yashraj Singh, Srikar Appalaraju, Vijay Mahadevan, Stefano Soatto
While bounding general memorization can have detrimental effects on the performance of a trained model, bounding RM does not prevent effective learning.
no code implementations • 17 May 2022 • Thomas Delteil, Edouard Belval, Lei Chen, Luis Goncalves, Vijay Mahadevan
In these, text semantics and visual information supplement each other to provide a global understanding of the document.
no code implementations • ICCV 2023 • Haofu Liao, Aruni RoyChowdhury, Weijian Li, Ankan Bansal, Yuting Zhang, Zhuowen Tu, Ravi Kumar Satzoda, R. Manmatha, Vijay Mahadevan
We present a new formulation for structured information extraction (SIE) from visually rich documents.
Ranked #2 on Entity Linking on FUNSD
no code implementations • 15 Nov 2023 • Peng Tang, Pengkai Zhu, Tian Li, Srikar Appalaraju, Vijay Mahadevan, R. Manmatha
Based on the multi-exit model, we perform step-level dynamic early exit during inference, where the model may decide to use fewer decoder layers based on its confidence of the current layer at each individual decoding step.
no code implementations • 15 Nov 2023 • Peng Tang, Srikar Appalaraju, R. Manmatha, Yusheng Xie, Vijay Mahadevan
We present Multiple-Question Multiple-Answer (MQMA), a novel approach to do text-VQA in encoder-decoder transformer models.
no code implementations • 5 Mar 2024 • Yuan Gao, Kunyu Shi, Pengkai Zhu, Edouard Belval, Oren Nuriel, Srikar Appalaraju, Shabnam Ghadar, Vijay Mahadevan, Zhuowen Tu, Stefano Soatto
We propose Strongly Supervised pre-training with ScreenShots (S4) - a novel pre-training paradigm for Vision-Language Models using data from large-scale web screenshot rendering.