no code implementations • 4 Sep 2023 • Soumya Jahagirdar, Minesh Mathew, Dimosthenis Karatzas, C. V. Jawahar
Researchers have extensively studied the field of vision and language, discovering that both visual and textual content is crucial for understanding scenes effectively.
no code implementations • 8 Jul 2023 • George Tom, Minesh Mathew, Sergi Garcia, Dimosthenis Karatzas, C. V. Jawahar
Text and signs around roads provide crucial information for drivers, vital for safe navigation and situational awareness.
no code implementations • 10 Nov 2022 • Soumya Jahagirdar, Minesh Mathew, Dimosthenis Karatzas, C. V. Jawahar
We demonstrate the limitations of current Scene Text VQA and VideoQA methods and propose ways to incorporate scene text information into VideoQA methods.
no code implementations • 13 May 2022 • Minesh Mathew, CV Jawahar
Recognition of text on word or line images, without the need for sub-word segmentation has become the mainstream of research and development of text recognition for Indian languages.
no code implementations • 10 Nov 2021 • Rubèn Tito, Minesh Mathew, C. V. Jawahar, Ernest Valveny, Dimosthenis Karatzas
In this report we present results of the ICDAR 2021 edition of the Document Visual Question Challenges.
no code implementations • 2 Oct 2021 • Minesh Mathew, Lluis Gomez, Dimosthenis Karatzas, CV Jawahar
This work addresses the problem of Question Answering (QA) on handwritten document collections.
no code implementations • 26 Apr 2021 • Minesh Mathew, Viraj Bagal, Rubèn Pérez Tito, Dimosthenis Karatzas, Ernest Valveny, C. V Jawahar
Infographics are documents designed to effectively communicate information using a combination of textual, graphical and visual elements.
no code implementations • 9 Apr 2021 • Minesh Mathew, Mohit Jain, CV Jawahar
And the performance is bench-marked on a new IIIT-ILST dataset comprising of hundreds of real scene images containing text in the above mentioned scripts.
1 code implementation • 3 Apr 2021 • Yash Khare, Viraj Bagal, Minesh Mathew, Adithi Devi, U Deva Priyakumar, CV Jawahar
Images in the medical domain are fundamentally different from the general domain images.
no code implementations • 20 Aug 2020 • Minesh Mathew, Ruben Tito, Dimosthenis Karatzas, R. Manmatha, C. V. Jawahar
For the task 1 a new dataset is introduced comprising 50, 000 questions-answer(s) pairs defined over 12, 767 document images.
3 code implementations • 1 Jul 2020 • Minesh Mathew, Dimosthenis Karatzas, C. V. Jawahar
The dataset consists of 50, 000 questions defined on 12, 000+ document images.
Ranked #1 on Visual Question Answering (VQA) on DocVQA val
no code implementations • 19 May 2020 • Sangeeth Reddy, Minesh Mathew, Lluis Gomez, Marcal Rusinol, Dimosthenis Karatzas., C. V. Jawahar
State of the art methods for text detection, recognition and tracking are evaluated on the new dataset and the results signify the challenges in unconstrained driving videos compared to existing datasets.
no code implementations • 30 Jun 2019 • Ali Furkan Biten, Rubèn Tito, Andres Mafla, Lluis Gomez, Marçal Rusiñol, Minesh Mathew, C. V. Jawahar, Ernest Valveny, Dimosthenis Karatzas
ST-VQA introduces an important aspect that is not addressed by any Visual Question Answering system up to date, namely the incorporation of scene text to answer questions asked about an image.
no code implementations • 28 May 2019 • Deepayan Das, Jerin Philip, Minesh Mathew, C. V. Jawahar
Word error rate of an ocr is often higher than its character error rate.
no code implementations • 7 Nov 2017 • Mohit Jain, Minesh Mathew, C. V. Jawahar
For scripts like Arabic, a major challenge in developing robust recognizers is the lack of large quantity of annotated data.