Search Results for author: Vijay Mahadevan

Found 19 papers, 2 papers with code

Enhancing Vision-Language Pre-training with Rich Supervisions

no code implementations • 5 Mar 2024 • Yuan Gao, Kunyu Shi, Pengkai Zhu, Edouard Belval, Oren Nuriel, Srikar Appalaraju, Shabnam Ghadar, Vijay Mahadevan, Zhuowen Tu, Stefano Soatto

We propose Strongly Supervised pre-training with ScreenShots (S4) - a novel pre-training paradigm for Vision-Language Models using data from large-scale web screenshot rendering.

Table Detection

Paper
Add Code

Multiple-Question Multiple-Answer Text-VQA

no code implementations • 15 Nov 2023 • Peng Tang, Srikar Appalaraju, R. Manmatha, Yusheng Xie, Vijay Mahadevan

We present Multiple-Question Multiple-Answer (MQMA), a novel approach to do text-VQA in encoder-decoder transformer models.

Denoising Optical Character Recognition (OCR) +1

Paper
Add Code

DEED: Dynamic Early Exit on Decoder for Accelerating Encoder-Decoder Transformer Models

no code implementations • 15 Nov 2023 • Peng Tang, Pengkai Zhu, Tian Li, Srikar Appalaraju, Vijay Mahadevan, R. Manmatha

Based on the multi-exit model, we perform step-level dynamic early exit during inference, where the model may decide to use fewer decoder layers based on its confidence of the current layer at each individual decoding step.

Paper
Add Code

DocTr: Document Transformer for Structured Information Extraction in Documents

no code implementations • ICCV 2023 • Haofu Liao, Aruni RoyChowdhury, Weijian Li, Ankan Bansal, Yuting Zhang, Zhuowen Tu, Ravi Kumar Satzoda, R. Manmatha, Vijay Mahadevan

We present a new formulation for structured information extraction (SIE) from visually rich documents.

Ranked #2 on Entity Linking on FUNSD

Entity Linking Semantic entity labeling

Paper
Add Code

PolyFormer: Referring Image Segmentation as Sequential Polygon Generation

1 code implementation • CVPR 2023 • Jiang Liu, Hui Ding, Zhaowei Cai, Yuting Zhang, Ravi Kumar Satzoda, Vijay Mahadevan, R. Manmatha

In this work, instead of directly predicting the pixel-level segmentation masks, the problem of referring image segmentation is formulated as sequential polygon generation, and the predicted polygons can be later converted into segmentation masks.

Ranked #1 on Referring Expression Segmentation on ReferIt (using extra training data)

Image Segmentation Quantization +6

108

Paper
Code

MATrIX -- Modality-Aware Transformer for Information eXtraction

no code implementations • 17 May 2022 • Thomas Delteil, Edouard Belval, Lei Chen, Luis Goncalves, Vijay Mahadevan

In these, text semantics and visual information supplement each other to provide a global understanding of the document.

document understanding

Paper
Add Code

Towards Differential Relational Privacy and its use in Question Answering

no code implementations • 30 Mar 2022 • Simone Bombari, Alessandro Achille, Zijian Wang, Yu-Xiang Wang, Yusheng Xie, Kunwar Yashraj Singh, Srikar Appalaraju, Vijay Mahadevan, Stefano Soatto

While bounding general memorization can have detrimental effects on the performance of a trained model, bounding RM does not prevent effective learning.

Memorization Question Answering

Paper
Add Code

Adaptive Regularization of B-Spline Models for Scientific Data

no code implementations • 23 Mar 2022 • David Lenz, Raine Yeh, Vijay Mahadevan, Iulian Grindeanu, Tom Peterka

B-spline models are a powerful way to represent scientific data sets with a functional approximation.

Paper
Add Code

Contrastive Neighborhood Alignment

no code implementations • 6 Jan 2022 • Pengkai Zhu, Zhaowei Cai, Yuanjun Xiong, Zhuowen Tu, Luis Goncalves, Vijay Mahadevan, Stefano Soatto

We present Contrastive Neighborhood Alignment (CNA), a manifold learning approach to maintain the topology of learned features whereby data points that are mapped to nearby representations by the source (teacher) model are also mapped to neighbors by the target (student) model.

Paper
Add Code

Visual Relationship Detection Using Part-and-Sum Transformers with Composite Queries

no code implementations • ICCV 2021 • Qi Dong, Zhuowen Tu, Haofu Liao, Yuting Zhang, Vijay Mahadevan, Stefano Soatto

Computer vision applications such as visual relationship detection and human object interaction can be formulated as a composite (structured) set detection problem in which both the parts (subject, object, and predicate) and the sum (triplet as a whole) are to be detected in a hierarchical fashion.

Human-Object Interaction Detection Object +2

Paper
Add Code

End-to-End Piece-Wise Unwarping of Document Images

no code implementations • ICCV 2021 • Sagnik Das, Kunwar Yashraj Singh, Jon Wu, Erhan Bas, Vijay Mahadevan, Rahul Bhotika, Dimitris Samaras

Document unwarping attempts to undo the physical deformation of the paper and recover a 'flatbed' scanned document-image for downstream tasks such as OCR.

MS-SSIM Optical Character Recognition (OCR) +1

Paper
Add Code

Multimodal Attention for Layout Synthesis in Diverse Domains

no code implementations • 1 Jan 2021 • Kamal Gupta, Vijay Mahadevan, Alessandro Achille, Justin Lazarow, Larry S. Davis, Abhinav Shrivastava

We address the problem of scene layout generation for diverse domains such as images, mobile applications, documents and 3D objects.

Paper
Add Code

LayoutTransformer: Layout Generation and Completion with Self-attention

2 code implementations • ICCV 2021 • Kamal Gupta, Justin Lazarow, Alessandro Achille, Larry Davis, Vijay Mahadevan, Abhinav Shrivastava

Generating a new layout or extending an existing layout requires understanding the relationships between these primitives.

140

Paper
Code

Toward Understanding Catastrophic Forgetting in Continual Learning

no code implementations • 2 Aug 2019 • Cuong V. Nguyen, Alessandro Achille, Michael Lam, Tal Hassner, Vijay Mahadevan, Stefano Soatto

As an application, we apply our procedure to study two properties of a task sequence: (1) total complexity and (2) sequential heterogeneity.

Continual Learning

Paper
Add Code

VLAD3: Encoding Dynamics of Deep Features for Action Recognition

no code implementations • CVPR 2016 • Yingwei Li, Weixin Li, Vijay Mahadevan, Nuno Vasconcelos

To account for long-range inhomogeneous dynamics, a VLAD descriptor is derived for the LDS and pooled over the whole video, to arrive at the final VLAD^3 representation.

Action Recognition Temporal Action Localization

Paper
Add Code

Learning Optimal Seeds for Diffusion-based Salient Object Detection

no code implementations • CVPR 2014 • Song Lu, Vijay Mahadevan, Nuno Vasconcelos

The propagation of the resulting saliency seeds, using a diffusion process, is finally shown to outperform the state of the art on a number of salient object detection datasets.

Object object-detection +4

Paper
Add Code

On the connections between saliency and tracking

no code implementations • NeurIPS 2012 • Vijay Mahadevan, Nuno Vasconcelos

A model connecting visual tracking and saliency has recently been proposed.

Visual Tracking

Paper
Add Code

Maximum Covariance Unfolding : Manifold Learning for Bimodal Data

no code implementations • NeurIPS 2011 • Vijay Mahadevan, Chi W. Wong, Jose C. Pereira, Tom Liu, Nuno Vasconcelos, Lawrence K. Saul

To perform this visualization, we augment MCU with an additional step for metric learning in the high dimensional voxel space.

Cross-Modal Retrieval Dimensionality Reduction +3

Paper
Add Code

The discriminant center-surround hypothesis for bottom-up saliency

no code implementations • NeurIPS 2007 • Dashan Gao, Vijay Mahadevan, Nuno Vasconcelos

The classical hypothesis, that bottom-up saliency is a center-surround process, is combined with a more recent hypothesis that all saliency decisions are optimal in a decision-theoretic sense.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.