Search Results for author: Srikar Appalaraju

Found 18 papers, 7 papers with code

Enhancing Vision-Language Pre-training with Rich Supervisions

no code implementations • 5 Mar 2024 • Yuan Gao, Kunyu Shi, Pengkai Zhu, Edouard Belval, Oren Nuriel, Srikar Appalaraju, Shabnam Ghadar, Vijay Mahadevan, Zhuowen Tu, Stefano Soatto

We propose Strongly Supervised pre-training with ScreenShots (S4) - a novel pre-training paradigm for Vision-Language Models using data from large-scale web screenshot rendering.

Table Detection

Paper
Add Code

Multiple-Question Multiple-Answer Text-VQA

no code implementations • 15 Nov 2023 • Peng Tang, Srikar Appalaraju, R. Manmatha, Yusheng Xie, Vijay Mahadevan

We present Multiple-Question Multiple-Answer (MQMA), a novel approach to do text-VQA in encoder-decoder transformer models.

Denoising Optical Character Recognition (OCR) +1

Paper
Add Code

DEED: Dynamic Early Exit on Decoder for Accelerating Encoder-Decoder Transformer Models

no code implementations • 15 Nov 2023 • Peng Tang, Pengkai Zhu, Tian Li, Srikar Appalaraju, Vijay Mahadevan, R. Manmatha

Based on the multi-exit model, we perform step-level dynamic early exit during inference, where the model may decide to use fewer decoder layers based on its confidence of the current layer at each individual decoding step.

Paper
Add Code

A Multi-Modal Multilingual Benchmark for Document Image Classification

no code implementations • 25 Oct 2023 • Yoshinari Fujinuma, Siddharth Varia, Nishant Sankaran, Srikar Appalaraju, Bonan Min, Yogarshi Vyas

Document image classification is different from plain-text document classification and consists of classifying a document by understanding the content and structure of documents such as forms, emails, and other such documents.

Classification Document Classification +4

Paper
Add Code

DocFormerv2: Local Features for Document Understanding

1 code implementation • 2 Jun 2023 • Srikar Appalaraju, Peng Tang, Qi Dong, Nishant Sankaran, Yichu Zhou, R. Manmatha

We propose DocFormerv2, a multi-modal transformer for Visual Document Understanding (VDU).

Ranked #9 on Visual Question Answering (VQA) on DocVQA test (using extra training data)

document understanding Optical Character Recognition (OCR) +1

Paper
Code

SimCon Loss with Multiple Views for Text Supervised Semantic Segmentation

no code implementations • 7 Feb 2023 • Yash Patel, Yusheng Xie, Yi Zhu, Srikar Appalaraju, R. Manmatha

Instead of purely relying on the alignment from the noisy data, this paper proposes a novel loss function termed SimCon, which accounts for intra-modal similarities to determine the appropriate set of positive samples to align.

Semantic Segmentation

Paper
Add Code

YORO -- Lightweight End to End Visual Grounding

1 code implementation • 15 Nov 2022 • Chih-Hui Ho, Srikar Appalaraju, Bhavan Jasani, R. Manmatha, Nuno Vasconcelos

We present YORO - a multi-modal transformer encoder-only architecture for the Visual Grounding (VG) task.

Natural Language Queries Visual Grounding

Paper
Code

MixGen: A New Multi-Modal Data Augmentation

1 code implementation • 16 Jun 2022 • Xiaoshuai Hao, Yi Zhu, Srikar Appalaraju, Aston Zhang, Wanqian Zhang, Bo Li, Mu Li

Data augmentation is a necessity to enhance data efficiency in deep learning.

Data Augmentation Question Answering +7

104

Paper
Code

Towards Differential Relational Privacy and its use in Question Answering

no code implementations • 30 Mar 2022 • Simone Bombari, Alessandro Achille, Zijian Wang, Yu-Xiang Wang, Yusheng Xie, Kunwar Yashraj Singh, Srikar Appalaraju, Vijay Mahadevan, Stefano Soatto

While bounding general memorization can have detrimental effects on the performance of a trained model, bounding RM does not prevent effective learning.

Memorization Question Answering

Paper
Add Code

LaTr: Layout-Aware Transformer for Scene-Text VQA

1 code implementation • CVPR 2022 • Ali Furkan Biten, Ron Litman, Yusheng Xie, Srikar Appalaraju, R. Manmatha

Accounting for this, we propose a single objective pre-training scheme that requires only text and spatial cues.

Optical Character Recognition (OCR) Question Answering +1

Paper
Code

DocFormer: End-to-End Transformer for Document Understanding

1 code implementation • ICCV 2021 • Srikar Appalaraju, Bhavan Jasani, Bhargava Urala Kota, Yusheng Xie, R. Manmatha

DocFormer uses text, vision and spatial features and combines them using a novel multi-modal self-attention layer.

Ranked #3 on Document Image Classification on RVL-CDIP

Document Image Classification document understanding

245

Paper
Code

Towards Good Practices in Self-supervised Representation Learning

no code implementations • 1 Dec 2020 • Srikar Appalaraju, Yi Zhu, Yusheng Xie, István Fehérvári

Self-supervised representation learning has seen remarkable progress in the last few years.

Representation Learning

Paper
Add Code

Saliency Driven Perceptual Image Compression

no code implementations • 12 Feb 2020 • Yash Patel, Srikar Appalaraju, R. Manmatha

The proposed compression model incorporates the salient regions and optimizes on the proposed perceptual similarity metric.

Image Compression MS-SSIM +3

Paper
Add Code

Unbiased Evaluation of Deep Metric Learning Algorithms

1 code implementation • 28 Nov 2019 • Istvan Fehervari, Avinash Ravichandran, Srikar Appalaraju

Deep metric learning (DML) is a popular approach for images retrieval, solving verification (same or not) problems and addressing open set classification.

Attribute Metric Learning +2

Paper
Code

Human Perceptual Evaluations for Image Compression

no code implementations • 9 Aug 2019 • Yash Patel, Srikar Appalaraju, R. Manmatha

Recently, there has been much interest in deep learning techniques to do image compression and there have been claims that several of these produce better results than engineered compression schemes (such as JPEG, JPEG2000 or BPG).

Image Compression MS-SSIM +1

Paper
Add Code

Deep Perceptual Compression

no code implementations • 18 Jul 2019 • Yash Patel, Srikar Appalaraju, R. Manmatha

In several cases, the MS-SSIM for deep learned techniques is higher than say a conventional, non-deep learned codec such as JPEG-2000 or BPG.

Image Compression MS-SSIM +3

Paper
Add Code

Scalable Logo Recognition using Proxies

no code implementations • 19 Nov 2018 • Istvan Fehervari, Srikar Appalaraju

Logo recognition is a challenging problem as there is no clear definition of a logo and there are huge variations of logos, brands and re-training to cover every variation is impractical.

Few-Shot Object Detection Logo Recognition +1

Paper
Add Code

Image similarity using Deep CNN and Curriculum Learning

1 code implementation • 26 Sep 2017 • Srikar Appalaraju, Vineet Chaoji

Image similarity involves fetching similar looking images given a reference image.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.