Search Results for author: Anand Mishra

Found 15 papers, 5 papers with code

Sketch-guided Image Inpainting with Partial Discrete Diffusion Process

1 code implementation • 18 Apr 2024 • Nakul Sharma, Aditay Tripathi, Anirban Chakraborty, Anand Mishra

In this work, we study the task of sketch-guided image inpainting.

Paper
Code

Towards Scene-Text to Scene-Text Translation

no code implementations • 6 Aug 2023 • Onkar Susladkar, Prajwal Gatti, Anand Mishra

In this work, we study the task of ``visually" translating scene text from a source language (e. g., English) to a target language (e. g., Chinese).

Scene Text Editing Translation

Paper
Add Code

Answer Mining from a Pool of Images: Towards Retrieval-Based Visual Question Answering

no code implementations • 29 Jun 2023 • Abhirama Subramanyam Penamakuri, Manish Gupta, Mithun Das Gupta, Anand Mishra

We study visual question answering in a setting where the answer has to be mined from a pool of relevant and irrelevant images given as a context.

Answer Generation Question Answering +2

Paper
Add Code

Query-guided Attention in Vision Transformers for Localizing Objects Using a Single Sketch

no code implementations • 15 Mar 2023 • Aditay Tripathi, Anand Mishra, Anirban Chakraborty

and Sketchy datasets, respectively, and a $12. 2\%$ improvement in AP@50 for large objects that are `unseen' during training.

Object object-detection +2

Paper
Add Code

Few-Shot Referring Relationships in Videos

1 code implementation • CVPR 2023 • Yogesh Kumar, Anand Mishra

Given a query visual relationship as <subject, predicate, object> and a test video, our objective is to localize the subject and object that are connected via the predicate.

Object Relation Network +1

Paper
Code

Multimodal Query-guided Object Localization

no code implementations • 1 Dec 2022 • Aditay Tripathi, Rajath R Dani, Anand Mishra, Anirban Chakraborty

In such a scenario, a hand-drawn sketch of the object could be a choice for a query.

Object Object Localization +1

Paper
Add Code

Contrastive Multi-View Textual-Visual Encoding: Towards One Hundred Thousand-Scale One-Shot Logo Identification

no code implementations • 23 Nov 2022 • Nakul Sharma, Abhirama S. Penamakuri, Anand Mishra

To fill this gap in the literature, we introduce Wikidata Reference Logo Dataset (WiRLD), containing logos for 100K business brands harvested from Wikidata.

Logo Recognition

Paper
Add Code

Look, Read and Ask: Learning to Ask Questions by Reading Text in Images

no code implementations • 23 Nov 2022 • Soumya Jahagirdar, Shankar Gangisetty, Anand Mishra

However, it is challenging as it requires an in-depth understanding of the scene and the ability to semantically bridge the visual content with the text present in the image.

Optical Character Recognition (OCR) Question Answering +4

Paper
Add Code

Grounding Scene Graphs on Natural Images via Visio-Lingual Message Passing

no code implementations • 3 Nov 2022 • Aditay Tripathi, Anand Mishra, Anirban Chakraborty

In VL-MPAG Net, we first construct a directed graph with object proposals as nodes and an edge between a pair of nodes representing a plausible relation between them.

Object Object Localization

Paper
Add Code

COFAR: Commonsense and Factual Reasoning in Image Search

no code implementations • 16 Oct 2022 • Prajwal Gatti, Abhirama Subramanyam Penamakuri, Revant Teotia, Anand Mishra, Shubhashis Sengupta, Roshni Ramnani

To enable both commonsense and factual reasoning in the image search, we present a unified framework, namely Knowledge Retrieval-Augmented Multimodal Transformer (KRAMT), that treats the named visual entities in an image as a gateway to encyclopedic knowledge and leverages them along with natural language query to ground relevant knowledge.

Image Retrieval Retrieval +1

Paper
Add Code

Few-shot Visual Relationship Co-localization

1 code implementation • ICCV 2021 • Revant Teotia, Vaibhav Mishra, Mayank Maheshwari, Anand Mishra

In this paper, given a small bag of images, each containing a common but latent predicate, we are interested in localizing visual subject-object pairs connected via the common predicate in each of the images.

Meta-Learning Object