no code implementations • 4 Sep 2023 • Soumya Jahagirdar, Minesh Mathew, Dimosthenis Karatzas, C. V. Jawahar
Researchers have extensively studied the field of vision and language, discovering that both visual and textual content is crucial for understanding scenes effectively.
no code implementations • 1 Aug 2023 • Shamanthak Hegde, Soumya Jahagirdar, Shankar Gangisetty
Such a simple, yet effective approach increases the understanding and correlation between the image features and text present in the image, which helps in the better answering of questions.
Optical Character Recognition (OCR) Visual Question Answering (VQA)
no code implementations • 11 Jun 2023 • Charani Alampalle, Shamanthak Hegde, Soumya Jahagirdar, Shankar Gangisetty
To address this issue, we propose a weakly-supervised visual question answer generation method that generates a relevant question-answer pairs for a given input image and associated caption.
no code implementations • 23 Nov 2022 • Soumya Jahagirdar, Shankar Gangisetty, Anand Mishra
However, it is challenging as it requires an in-depth understanding of the scene and the ability to semantically bridge the visual content with the text present in the image.
no code implementations • 10 Nov 2022 • Soumya Jahagirdar, Minesh Mathew, Dimosthenis Karatzas, C. V. Jawahar
We demonstrate the limitations of current Scene Text VQA and VideoQA methods and propose ways to incorporate scene text information into VideoQA methods.