Search Results for author: Rubèn Tito

Found 7 papers, 2 papers with code

OCR-IDL: OCR Annotations for Industry Document Library Dataset

1 code implementation • 25 Feb 2022 • Ali Furkan Biten, Rubèn Tito, Lluis Gomez, Ernest Valveny, Dimosthenis Karatzas

It is our hope that OCR-IDL can be a starting point for future works on Document Intelligence.

Paper
Code

Hierarchical multimodal transformers for Multi-Page DocVQA

1 code implementation • 7 Dec 2022 • Rubèn Tito, Dimosthenis Karatzas, Ernest Valveny

The proposed method is based on a hierarchical transformer architecture where the encoder summarizes the most relevant information of every page and then, the decoder takes this summarized information to generate the final answer.

Question Answering Visual Question Answering

Paper
Code

ICDAR 2019 Competition on Scene Text Visual Question Answering

no code implementations • 30 Jun 2019 • Ali Furkan Biten, Rubèn Tito, Andres Mafla, Lluis Gomez, Marçal Rusiñol, Minesh Mathew, C. V. Jawahar, Ernest Valveny, Dimosthenis Karatzas

ST-VQA introduces an important aspect that is not addressed by any Visual Question Answering system up to date, namely the incorporation of scene text to answer questions asked about an image.

Question Answering Visual Question Answering

Paper
Add Code

Multimodal grid features and cell pointers for Scene Text Visual Question Answering

no code implementations • 1 Jun 2020 • Lluís Gómez, Ali Furkan Biten, Rubèn Tito, Andrés Mafla, Marçal Rusiñol, Ernest Valveny, Dimosthenis Karatzas

This paper presents a new model for the task of scene text visual question answering, in which questions about a given image can only be answered by reading and understanding scene text that is present in it.

Question Answering Visual Question Answering

Paper
Add Code

Document Collection Visual Question Answering

no code implementations • 27 Apr 2021 • Rubèn Tito, Dimosthenis Karatzas, Ernest Valveny

Current tasks and methods in Document Understanding aims to process documents as single elements.

document understanding Question Answering +1

Paper
Add Code

ICDAR 2021 Competition on Document VisualQuestion Answering

no code implementations • 10 Nov 2021 • Rubèn Tito, Minesh Mathew, C. V. Jawahar, Ernest Valveny, Dimosthenis Karatzas

In this report we present results of the ICDAR 2021 edition of the Document Visual Question Challenges.

Visual Question Answering (VQA)

Paper
Add Code

Privacy-Aware Document Visual Question Answering

no code implementations • 15 Dec 2023 • Rubèn Tito, Khanh Nguyen, Marlon Tobaben, Raouf Kerkouche, Mohamed Ali Souibgui, Kangsoo Jung, Lei Kang, Ernest Valveny, Antti Honkela, Mario Fritz, Dimosthenis Karatzas

We employ a federated learning scheme, that reflects the real-life distribution of documents in different businesses, and we explore the use case where the ID of the invoice issuer is the sensitive information to be protected.

document understanding Federated Learning +3

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.