Search Results for author: Minesh Mathew

Found 15 papers, 2 papers with code

Understanding Video Scenes through Text: Insights from Text-based Video Question Answering

no code implementations • 4 Sep 2023 • Soumya Jahagirdar, Minesh Mathew, Dimosthenis Karatzas, C. V. Jawahar

Researchers have extensively studied the field of vision and language, discovering that both visual and textual content is crucial for understanding scenes effectively.

Domain Adaptation Question Answering +1

Paper
Add Code

Reading Between the Lanes: Text VideoQA on the Road

no code implementations • 8 Jul 2023 • George Tom, Minesh Mathew, Sergi Garcia, Dimosthenis Karatzas, C. V. Jawahar

Text and signs around roads provide crucial information for drivers, vital for safe navigation and situational awareness.

Question Answering Scene Text Recognition +1

Paper
Add Code

Watching the News: Towards VideoQA Models that can Read

no code implementations • 10 Nov 2022 • Soumya Jahagirdar, Minesh Mathew, Dimosthenis Karatzas, C. V. Jawahar

We demonstrate the limitations of current Scene Text VQA and VideoQA methods and propose ways to incorporate scene text information into VideoQA methods.

Question Answering Video Question Answering +1

Paper
Add Code

An empirical study of CTC based models for OCR of Indian languages

no code implementations • 13 May 2022 • Minesh Mathew, CV Jawahar

Recognition of text on word or line images, without the need for sub-word segmentation has become the mainstream of research and development of text recognition for Indian languages.

Optical Character Recognition (OCR) Segmentation +1

Paper
Add Code

ICDAR 2021 Competition on Document VisualQuestion Answering

no code implementations • 10 Nov 2021 • Rubèn Tito, Minesh Mathew, C. V. Jawahar, Ernest Valveny, Dimosthenis Karatzas

In this report we present results of the ICDAR 2021 edition of the Document Visual Question Challenges.

Visual Question Answering (VQA)

Paper
Add Code

Asking questions on handwritten document collections

no code implementations • 2 Oct 2021 • Minesh Mathew, Lluis Gomez, Dimosthenis Karatzas, CV Jawahar

This work addresses the problem of Question Answering (QA) on handwritten document collections.

Optical Character Recognition (OCR) Question Answering +2

Paper
Add Code

InfographicVQA

no code implementations • 26 Apr 2021 • Minesh Mathew, Viraj Bagal, Rubèn Pérez Tito, Dimosthenis Karatzas, Ernest Valveny, C. V Jawahar

Infographics are documents designed to effectively communicate information using a combination of textual, graphical and visual elements.

Question Answering Visual Question Answering

Paper
Add Code

Benchmarking Scene Text Recognition in Devanagari, Telugu and Malayalam

no code implementations • 9 Apr 2021 • Minesh Mathew, Mohit Jain, CV Jawahar

And the performance is bench-marked on a new IIIT-ILST dataset comprising of hundreds of real scene images containing text in the above mentioned scripts.

Benchmarking Scene Text Recognition

Paper
Add Code

MMBERT: Multimodal BERT Pretraining for Improved Medical VQA

1 code implementation • 3 Apr 2021 • Yash Khare, Viraj Bagal, Minesh Mathew, Adithi Devi, U Deva Priyakumar, CV Jawahar

Images in the medical domain are fundamentally different from the general domain images.

Language Modelling Masked Language Modeling +2

Paper
Code

Document Visual Question Answering Challenge 2020

no code implementations • 20 Aug 2020 • Minesh Mathew, Ruben Tito, Dimosthenis Karatzas, R. Manmatha, C. V. Jawahar

For the task 1 a new dataset is introduced comprising 50, 000 questions-answer(s) pairs defined over 12, 767 document images.

Question Answering Retrieval +2

Paper
Add Code

DocVQA: A Dataset for VQA on Document Images

3 code implementations • 1 Jul 2020 • Minesh Mathew, Dimosthenis Karatzas, C. V. Jawahar

The dataset consists of 50, 000 questions defined on 12, 000+ document images.

Ranked #1 on Visual Question Answering (VQA) on DocVQA val

Question Answering Reading Comprehension +1

102

Paper
Code

RoadText-1K: Text Detection & Recognition Dataset for Driving Videos

no code implementations • 19 May 2020 • Sangeeth Reddy, Minesh Mathew, Lluis Gomez, Marcal Rusinol, Dimosthenis Karatzas., C. V. Jawahar

State of the art methods for text detection, recognition and tracking are evaluated on the new dataset and the results signify the challenges in unconstrained driving videos compared to existing datasets.

Text Detection

Paper
Add Code

ICDAR 2019 Competition on Scene Text Visual Question Answering

no code implementations • 30 Jun 2019 • Ali Furkan Biten, Rubèn Tito, Andres Mafla, Lluis Gomez, Marçal Rusiñol, Minesh Mathew, C. V. Jawahar, Ernest Valveny, Dimosthenis Karatzas

ST-VQA introduces an important aspect that is not addressed by any Visual Question Answering system up to date, namely the incorporation of scene text to answer questions asked about an image.

Question Answering Visual Question Answering

Paper
Add Code

A Cost Efficient Approach to Correct OCR Errors in Large Document Collections

no code implementations • 28 May 2019 • Deepayan Das, Jerin Philip, Minesh Mathew, C. V. Jawahar

Word error rate of an ocr is often higher than its character error rate.

Clustering Language Modelling +1

Paper
Add Code

Unconstrained Scene Text and Video Text Recognition for Arabic Script

no code implementations • 7 Nov 2017 • Mohit Jain, Minesh Mathew, C. V. Jawahar

For scripts like Arabic, a major challenge in developing robust recognizers is the lack of large quantity of annotated data.

Scene Text Recognition

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.