Search Results for author: Srikar Appalaraju

Found 18 papers, 7 papers with code

Enhancing Vision-Language Pre-training with Rich Supervisions

no code implementations5 Mar 2024 Yuan Gao, Kunyu Shi, Pengkai Zhu, Edouard Belval, Oren Nuriel, Srikar Appalaraju, Shabnam Ghadar, Vijay Mahadevan, Zhuowen Tu, Stefano Soatto

We propose Strongly Supervised pre-training with ScreenShots (S4) - a novel pre-training paradigm for Vision-Language Models using data from large-scale web screenshot rendering.

Table Detection

Multiple-Question Multiple-Answer Text-VQA

no code implementations15 Nov 2023 Peng Tang, Srikar Appalaraju, R. Manmatha, Yusheng Xie, Vijay Mahadevan

We present Multiple-Question Multiple-Answer (MQMA), a novel approach to do text-VQA in encoder-decoder transformer models.

Denoising Optical Character Recognition (OCR) +1

DEED: Dynamic Early Exit on Decoder for Accelerating Encoder-Decoder Transformer Models

no code implementations15 Nov 2023 Peng Tang, Pengkai Zhu, Tian Li, Srikar Appalaraju, Vijay Mahadevan, R. Manmatha

Based on the multi-exit model, we perform step-level dynamic early exit during inference, where the model may decide to use fewer decoder layers based on its confidence of the current layer at each individual decoding step.

A Multi-Modal Multilingual Benchmark for Document Image Classification

no code implementations25 Oct 2023 Yoshinari Fujinuma, Siddharth Varia, Nishant Sankaran, Srikar Appalaraju, Bonan Min, Yogarshi Vyas

Document image classification is different from plain-text document classification and consists of classifying a document by understanding the content and structure of documents such as forms, emails, and other such documents.

Classification Document Classification +4

SimCon Loss with Multiple Views for Text Supervised Semantic Segmentation

no code implementations7 Feb 2023 Yash Patel, Yusheng Xie, Yi Zhu, Srikar Appalaraju, R. Manmatha

Instead of purely relying on the alignment from the noisy data, this paper proposes a novel loss function termed SimCon, which accounts for intra-modal similarities to determine the appropriate set of positive samples to align.

Semantic Segmentation

YORO -- Lightweight End to End Visual Grounding

1 code implementation15 Nov 2022 Chih-Hui Ho, Srikar Appalaraju, Bhavan Jasani, R. Manmatha, Nuno Vasconcelos

We present YORO - a multi-modal transformer encoder-only architecture for the Visual Grounding (VG) task.

Natural Language Queries Visual Grounding

Towards Differential Relational Privacy and its use in Question Answering

no code implementations30 Mar 2022 Simone Bombari, Alessandro Achille, Zijian Wang, Yu-Xiang Wang, Yusheng Xie, Kunwar Yashraj Singh, Srikar Appalaraju, Vijay Mahadevan, Stefano Soatto

While bounding general memorization can have detrimental effects on the performance of a trained model, bounding RM does not prevent effective learning.

Memorization Question Answering

Saliency Driven Perceptual Image Compression

no code implementations12 Feb 2020 Yash Patel, Srikar Appalaraju, R. Manmatha

The proposed compression model incorporates the salient regions and optimizes on the proposed perceptual similarity metric.

Image Compression MS-SSIM +3

Unbiased Evaluation of Deep Metric Learning Algorithms

1 code implementation28 Nov 2019 Istvan Fehervari, Avinash Ravichandran, Srikar Appalaraju

Deep metric learning (DML) is a popular approach for images retrieval, solving verification (same or not) problems and addressing open set classification.

Attribute Metric Learning +2

Human Perceptual Evaluations for Image Compression

no code implementations9 Aug 2019 Yash Patel, Srikar Appalaraju, R. Manmatha

Recently, there has been much interest in deep learning techniques to do image compression and there have been claims that several of these produce better results than engineered compression schemes (such as JPEG, JPEG2000 or BPG).

Image Compression MS-SSIM +1

Deep Perceptual Compression

no code implementations18 Jul 2019 Yash Patel, Srikar Appalaraju, R. Manmatha

In several cases, the MS-SSIM for deep learned techniques is higher than say a conventional, non-deep learned codec such as JPEG-2000 or BPG.

Image Compression MS-SSIM +3

Scalable Logo Recognition using Proxies

no code implementations19 Nov 2018 Istvan Fehervari, Srikar Appalaraju

Logo recognition is a challenging problem as there is no clear definition of a logo and there are huge variations of logos, brands and re-training to cover every variation is impractical.

Few-Shot Object Detection Logo Recognition +1

Image similarity using Deep CNN and Curriculum Learning

1 code implementation26 Sep 2017 Srikar Appalaraju, Vineet Chaoji

Image similarity involves fetching similar looking images given a reference image.

Cannot find the paper you are looking for? You can Submit a new open access paper.