Search Results for author: Ramprasaath R. Selvaraju

Found 14 papers, 9 papers with code

TAG: Boosting Text-VQA via Text-aware Visual Question-answer Generation

1 code implementation • 3 Aug 2022 • Jun Wang, Mingfei Gao, Yuqian Hu, Ramprasaath R. Selvaraju, Chetan Ramaiah, ran Xu, Joseph F. JaJa, Larry S. Davis

To address this deficiency, we develop a new method to generate high-quality and diverse QA pairs by explicitly utilizing the existing rich text available in the scene context of each image.

Ranked #3 on Visual Question Answering (VQA) on TextVQA test-standard

Answer Generation Question-Answer-Generation +3

Paper
Code

Can domain adaptation make object recognition work for everyone?

no code implementations • 23 Apr 2022 • Viraj Prabhu, Ramprasaath R. Selvaraju, Judy Hoffman, Nikhil Naik

Despite the rapid progress in deep visual recognition, modern computer vision datasets significantly overrepresent the developed world and models trained on such datasets underperform on images from unseen geographies.

Object Object Recognition +1

Paper
Add Code

CLIP-Lite: Information Efficient Visual Representation Learning with Language Supervision

1 code implementation • 14 Dec 2021 • Aman Shrivastava, Ramprasaath R. Selvaraju, Nikhil Naik, Vicente Ordonez

We propose CLIP-Lite, an information efficient method for visual representation learning by feature alignment with textual annotations.

Contrastive Learning Representation Learning +5

Paper
Code

PreViTS: Contrastive Pretraining with Video Tracking Supervision

no code implementations • 1 Dec 2021 • Brian Chen, Ramprasaath R. Selvaraju, Shih-Fu Chang, Juan Carlos Niebles, Nikhil Naik

In this work, we propose PreViTS, an SSL framework that utilizes an unsupervised tracking signal for selecting clips containing the same object, which helps better utilize temporal transformations of objects.

Action Classification Self-Supervised Learning +1

Paper
Add Code

Align before Fuse: Vision and Language Representation Learning with Momentum Distillation

5 code implementations • NeurIPS 2021 • Junnan Li, Ramprasaath R. Selvaraju, Akhilesh Deepak Gotmare, Shafiq Joty, Caiming Xiong, Steven Hoi

Most existing methods employ a transformer-based multimodal encoder to jointly model visual tokens (region-based image features) and word tokens.

Ranked #5 on Open Vocabulary Attribute Detection on OVAD-Box benchmark (using extra training data)

Grounded language learning Image-text matching +8

8,724

Paper
Code

CASTing Your Model: Learning to Localize Improves Self-Supervised Representations

no code implementations • CVPR 2021 • Ramprasaath R. Selvaraju, Karan Desai, Justin Johnson, Nikhil Naik

Recent advances in self-supervised learning (SSL) have largely closed the gap with supervised ImageNet pretraining.

Self-Supervised Learning Visual Grounding

Paper
Add Code

SOrT-ing VQA Models : Contrastive Gradient Learning for Improved Consistency

1 code implementation • NAACL 2021 • Sameer Dharur, Purva Tendulkar, Dhruv Batra, Devi Parikh, Ramprasaath R. Selvaraju

Recent research in Visual Question Answering (VQA) has revealed state-of-the-art models to be inconsistent in their understanding of the world -- they answer seemingly difficult questions requiring reasoning correctly but get simpler associated sub-questions wrong.

Question Answering Visual Grounding +1

Paper
Code

SQuINTing at VQA Models: Introspecting VQA Models with Sub-Questions

no code implementations • CVPR 2020 • Ramprasaath R. Selvaraju, Purva Tendulkar, Devi Parikh, Eric Horvitz, Marco Ribeiro, Besmira Nushi, Ece Kamar

We quantify the extent to which this phenomenon occurs by creating a new Reasoning split of the VQA dataset and collecting VQA-introspect, a new dataset1 which consists of 238K new perception questions which serve as sub questions corresponding to the set of perceptual tasks needed to effectively answer the complex reasoning questions in the Reasoning split.

Visual Question Answering (VQA)

Paper
Add Code

Trick or TReAT: Thematic Reinforcement for Artistic Typography

1 code implementation • 19 Mar 2019 • Purva Tendulkar, Kalpesh Krishna, Ramprasaath R. Selvaraju, Devi Parikh

An approach to make text visually appealing and memorable is semantic reinforcement - the use of visual cues alluding to the context or theme in which the word is being used to reinforce the message (e. g., Google Doodles).

Paper
Code

Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded

no code implementations • ICCV 2019 • Ramprasaath R. Selvaraju, Stefan Lee, Yilin Shen, Hongxia Jin, Shalini Ghosh, Larry Heck, Dhruv Batra, Devi Parikh

Many vision and language models suffer from poor visual grounding - often falling back on easy-to-learn language priors rather than basing their decisions on visual concepts in the image.

Image Captioning Question Answering +2

Paper
Add Code

Choose Your Neuron: Incorporating Domain Knowledge through Neuron-Importance

1 code implementation • ECCV 2018 • Ramprasaath R. Selvaraju, Prithvijit Chattopadhyay, Mohamed Elhoseiny, Tilak Sharma, Dhruv Batra, Devi Parikh, Stefan Lee

Our approach, which we call Neuron Importance-AwareWeight Transfer (NIWT), learns to map domain knowledge about novel "unseen" classes onto this dictionary of learned concepts and then optimizes for network parameters that can effectively combine these concepts - essentially learning classifiers by discovering and composing learned semantic concepts in deep networks.

Generalized Zero-Shot Learning

Paper
Code

Grad-CAM: Why did you say that?

2 code implementations • 22 Nov 2016 • Ramprasaath R. Selvaraju, Abhishek Das, Ramakrishna Vedantam, Michael Cogswell, Devi Parikh, Dhruv Batra

We propose a technique for making Convolutional Neural Network (CNN)-based models more transparent by visualizing input regions that are 'important' for predictions -- or visual explanations.

Image Captioning Visual Question Answering

9,437

Paper
Code

Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization

124 code implementations • ICCV 2017 • Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, Dhruv Batra

For captioning and VQA, we show that even non-attention based models can localize inputs.

General Classification Image Classification +2

9,437

Paper
Code

Counting Everyday Objects in Everyday Scenes

1 code implementation • CVPR 2017 • Prithvijit Chattopadhyay, Ramakrishna Vedantam, Ramprasaath R. Selvaraju, Dhruv Batra, Devi Parikh

In this work, we build dedicated models for counting designed to tackle the large variance in counts, appearances, and scales of objects found in natural scenes.

Ranked #1 on Object Counting on COCO count-test

Object Object Counting +4

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.