Search Results for author: R. Manmatha

Found 27 papers, 12 papers with code

Mixed-Query Transformer: A Unified Image Segmentation Architecture

no code implementations6 Apr 2024 Pei Wang, Zhaowei Cai, Hao Yang, Ashwin Swaminathan, R. Manmatha, Stefano Soatto

Existing unified image segmentation models either employ a unified architecture across multiple tasks but use separate weights tailored to each dataset, or apply a single set of weights to multiple datasets but are limited to a single task.

Data Augmentation Image Segmentation +2

DEED: Dynamic Early Exit on Decoder for Accelerating Encoder-Decoder Transformer Models

no code implementations15 Nov 2023 Peng Tang, Pengkai Zhu, Tian Li, Srikar Appalaraju, Vijay Mahadevan, R. Manmatha

Based on the multi-exit model, we perform step-level dynamic early exit during inference, where the model may decide to use fewer decoder layers based on its confidence of the current layer at each individual decoding step.

Multiple-Question Multiple-Answer Text-VQA

no code implementations15 Nov 2023 Peng Tang, Srikar Appalaraju, R. Manmatha, Yusheng Xie, Vijay Mahadevan

We present Multiple-Question Multiple-Answer (MQMA), a novel approach to do text-VQA in encoder-decoder transformer models.

Denoising Optical Character Recognition (OCR) +1

PolyFormer: Referring Image Segmentation as Sequential Polygon Generation

1 code implementation CVPR 2023 Jiang Liu, Hui Ding, Zhaowei Cai, Yuting Zhang, Ravi Kumar Satzoda, Vijay Mahadevan, R. Manmatha

In this work, instead of directly predicting the pixel-level segmentation masks, the problem of referring image segmentation is formulated as sequential polygon generation, and the predicted polygons can be later converted into segmentation masks.

 Ranked #1 on Referring Expression Segmentation on ReferIt (using extra training data)

Image Segmentation Quantization +6

SimCon Loss with Multiple Views for Text Supervised Semantic Segmentation

no code implementations7 Feb 2023 Yash Patel, Yusheng Xie, Yi Zhu, Srikar Appalaraju, R. Manmatha

Instead of purely relying on the alignment from the noisy data, this paper proposes a novel loss function termed SimCon, which accounts for intra-modal similarities to determine the appropriate set of positive samples to align.

Semantic Segmentation

YORO -- Lightweight End to End Visual Grounding

1 code implementation15 Nov 2022 Chih-Hui Ho, Srikar Appalaraju, Bhavan Jasani, R. Manmatha, Nuno Vasconcelos

We present YORO - a multi-modal transformer encoder-only architecture for the Visual Grounding (VG) task.

Natural Language Queries Visual Grounding

GLASS: Global to Local Attention for Scene-Text Spotting

2 code implementations5 Aug 2022 Roi Ronen, Shahar Tsiper, Oron Anschel, Inbal Lavi, Amir Markovitz, R. Manmatha

In recent years, the dominant paradigm for text spotting is to combine the tasks of text detection and recognition into a single end-to-end framework.

Text Detection Text Spotting

Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer

no code implementations CVPR 2022 Yair Kittenplon, Inbal Lavi, Sharon Fogel, Yarin Bar, R. Manmatha, Pietro Perona

Text spotting end-to-end methods have recently gained attention in the literature due to the benefits of jointly optimizing the text detection and recognition components.

Text Detection Text Spotting

On Calibration of Scene-Text Recognition Models

no code implementations23 Dec 2020 Ron Slossberg, Oron Anschel, Amir Markovitz, Ron Litman, Aviad Aberdam, Shahar Tsiper, Shai Mazor, Jon Wu, R. Manmatha

Although the topic of confidence calibration has been an active research area for the last several decades, the case of structured and sequence prediction calibration has been scarcely explored.

Scene Text Recognition

Document Visual Question Answering Challenge 2020

no code implementations20 Aug 2020 Minesh Mathew, Ruben Tito, Dimosthenis Karatzas, R. Manmatha, C. V. Jawahar

For the task 1 a new dataset is introduced comprising 50, 000 questions-answer(s) pairs defined over 12, 767 document images.

Question Answering Retrieval +2

Improving Semantic Segmentation via Self-Training

no code implementations30 Apr 2020 Yi Zhu, Zhongyue Zhang, Chongruo wu, Zhi Zhang, Tong He, Hang Zhang, R. Manmatha, Mu Li, Alexander Smola

In the case of semantic segmentation, this means that large amounts of pixelwise annotations are required to learn accurate models.

Domain Generalization Segmentation +1

SCATTER: Selective Context Attentional Scene Text Recognizer

2 code implementations CVPR 2020 Ron Litman, Oron Anschel, Shahar Tsiper, Roee Litman, Shai Mazor, R. Manmatha

The first attention step re-weights visual features from a CNN backbone together with contextual features computed by a BiLSTM layer.

Irregular Text Recognition Scene Text Recognition

Saliency Driven Perceptual Image Compression

no code implementations12 Feb 2020 Yash Patel, Srikar Appalaraju, R. Manmatha

The proposed compression model incorporates the salient regions and optimizes on the proposed perceptual similarity metric.

Image Compression MS-SSIM +3

Human Perceptual Evaluations for Image Compression

no code implementations9 Aug 2019 Yash Patel, Srikar Appalaraju, R. Manmatha

Recently, there has been much interest in deep learning techniques to do image compression and there have been claims that several of these produce better results than engineered compression schemes (such as JPEG, JPEG2000 or BPG).

Image Compression MS-SSIM +1

Deep Perceptual Compression

no code implementations18 Jul 2019 Yash Patel, Srikar Appalaraju, R. Manmatha

In several cases, the MS-SSIM for deep learned techniques is higher than say a conventional, non-deep learned codec such as JPEG-2000 or BPG.

Image Compression MS-SSIM +3

Searching for Apparel Products from Images in the Wild

no code implementations4 Jul 2019 Son Tran, Ming Du, Sampath Chanda, R. Manmatha, Cj Taylor

In particular, Instagram and Twitter influencers often provide images of themselves wearing different outfits and their followers are often inspired to buy similar clothes. We propose a system to automatically find the closest visually similar clothes in the online Catalog (street-to-shop searching).

Descriptive

Deep Decision Network for Multi-Class Image Classification

no code implementations CVPR 2016 Venkatesh N. Murthy, Vivek Singh, Terrence Chen, R. Manmatha, Dorin Comaniciu

During the learning phase, starting from the root network node, DDN automatically builds a network that splits the data into disjoint clusters of classes which would be handled by the subsequent expert networks.

Classification General Classification +1

Cannot find the paper you are looking for? You can Submit a new open access paper.