Search Results for author: Vijay Mahadevan

Found 19 papers, 2 papers with code

Enhancing Vision-Language Pre-training with Rich Supervisions

no code implementations5 Mar 2024 Yuan Gao, Kunyu Shi, Pengkai Zhu, Edouard Belval, Oren Nuriel, Srikar Appalaraju, Shabnam Ghadar, Vijay Mahadevan, Zhuowen Tu, Stefano Soatto

We propose Strongly Supervised pre-training with ScreenShots (S4) - a novel pre-training paradigm for Vision-Language Models using data from large-scale web screenshot rendering.

Table Detection

Multiple-Question Multiple-Answer Text-VQA

no code implementations15 Nov 2023 Peng Tang, Srikar Appalaraju, R. Manmatha, Yusheng Xie, Vijay Mahadevan

We present Multiple-Question Multiple-Answer (MQMA), a novel approach to do text-VQA in encoder-decoder transformer models.

Denoising Optical Character Recognition (OCR) +1

DEED: Dynamic Early Exit on Decoder for Accelerating Encoder-Decoder Transformer Models

no code implementations15 Nov 2023 Peng Tang, Pengkai Zhu, Tian Li, Srikar Appalaraju, Vijay Mahadevan, R. Manmatha

Based on the multi-exit model, we perform step-level dynamic early exit during inference, where the model may decide to use fewer decoder layers based on its confidence of the current layer at each individual decoding step.

PolyFormer: Referring Image Segmentation as Sequential Polygon Generation

1 code implementation CVPR 2023 Jiang Liu, Hui Ding, Zhaowei Cai, Yuting Zhang, Ravi Kumar Satzoda, Vijay Mahadevan, R. Manmatha

In this work, instead of directly predicting the pixel-level segmentation masks, the problem of referring image segmentation is formulated as sequential polygon generation, and the predicted polygons can be later converted into segmentation masks.

 Ranked #1 on Referring Expression Segmentation on ReferIt (using extra training data)

Image Segmentation Quantization +6

MATrIX -- Modality-Aware Transformer for Information eXtraction

no code implementations17 May 2022 Thomas Delteil, Edouard Belval, Lei Chen, Luis Goncalves, Vijay Mahadevan

In these, text semantics and visual information supplement each other to provide a global understanding of the document.

document understanding

Towards Differential Relational Privacy and its use in Question Answering

no code implementations30 Mar 2022 Simone Bombari, Alessandro Achille, Zijian Wang, Yu-Xiang Wang, Yusheng Xie, Kunwar Yashraj Singh, Srikar Appalaraju, Vijay Mahadevan, Stefano Soatto

While bounding general memorization can have detrimental effects on the performance of a trained model, bounding RM does not prevent effective learning.

Memorization Question Answering

Adaptive Regularization of B-Spline Models for Scientific Data

no code implementations23 Mar 2022 David Lenz, Raine Yeh, Vijay Mahadevan, Iulian Grindeanu, Tom Peterka

B-spline models are a powerful way to represent scientific data sets with a functional approximation.

Contrastive Neighborhood Alignment

no code implementations6 Jan 2022 Pengkai Zhu, Zhaowei Cai, Yuanjun Xiong, Zhuowen Tu, Luis Goncalves, Vijay Mahadevan, Stefano Soatto

We present Contrastive Neighborhood Alignment (CNA), a manifold learning approach to maintain the topology of learned features whereby data points that are mapped to nearby representations by the source (teacher) model are also mapped to neighbors by the target (student) model.

Visual Relationship Detection Using Part-and-Sum Transformers with Composite Queries

no code implementations ICCV 2021 Qi Dong, Zhuowen Tu, Haofu Liao, Yuting Zhang, Vijay Mahadevan, Stefano Soatto

Computer vision applications such as visual relationship detection and human object interaction can be formulated as a composite (structured) set detection problem in which both the parts (subject, object, and predicate) and the sum (triplet as a whole) are to be detected in a hierarchical fashion.

Human-Object Interaction Detection Object +2

End-to-End Piece-Wise Unwarping of Document Images

no code implementations ICCV 2021 Sagnik Das, Kunwar Yashraj Singh, Jon Wu, Erhan Bas, Vijay Mahadevan, Rahul Bhotika, Dimitris Samaras

Document unwarping attempts to undo the physical deformation of the paper and recover a 'flatbed' scanned document-image for downstream tasks such as OCR.

MS-SSIM Optical Character Recognition (OCR) +1

Multimodal Attention for Layout Synthesis in Diverse Domains

no code implementations1 Jan 2021 Kamal Gupta, Vijay Mahadevan, Alessandro Achille, Justin Lazarow, Larry S. Davis, Abhinav Shrivastava

We address the problem of scene layout generation for diverse domains such as images, mobile applications, documents and 3D objects.

LayoutTransformer: Layout Generation and Completion with Self-attention

2 code implementations ICCV 2021 Kamal Gupta, Justin Lazarow, Alessandro Achille, Larry Davis, Vijay Mahadevan, Abhinav Shrivastava

Generating a new layout or extending an existing layout requires understanding the relationships between these primitives.

Toward Understanding Catastrophic Forgetting in Continual Learning

no code implementations2 Aug 2019 Cuong V. Nguyen, Alessandro Achille, Michael Lam, Tal Hassner, Vijay Mahadevan, Stefano Soatto

As an application, we apply our procedure to study two properties of a task sequence: (1) total complexity and (2) sequential heterogeneity.

Continual Learning

VLAD3: Encoding Dynamics of Deep Features for Action Recognition

no code implementations CVPR 2016 Yingwei Li, Weixin Li, Vijay Mahadevan, Nuno Vasconcelos

To account for long-range inhomogeneous dynamics, a VLAD descriptor is derived for the LDS and pooled over the whole video, to arrive at the final VLAD^3 representation.

Action Recognition Temporal Action Localization

Learning Optimal Seeds for Diffusion-based Salient Object Detection

no code implementations CVPR 2014 Song Lu, Vijay Mahadevan, Nuno Vasconcelos

The propagation of the resulting saliency seeds, using a diffusion process, is finally shown to outperform the state of the art on a number of salient object detection datasets.

Object object-detection +4

The discriminant center-surround hypothesis for bottom-up saliency

no code implementations NeurIPS 2007 Dashan Gao, Vijay Mahadevan, Nuno Vasconcelos

The classical hypothesis, that bottom-up saliency is a center-surround process, is combined with a more recent hypothesis that all saliency decisions are optimal in a decision-theoretic sense.

Cannot find the paper you are looking for? You can Submit a new open access paper.