Search Results for author: Rubèn Tito

Found 7 papers, 2 papers with code

Hierarchical multimodal transformers for Multi-Page DocVQA

1 code implementation7 Dec 2022 Rubèn Tito, Dimosthenis Karatzas, Ernest Valveny

The proposed method is based on a hierarchical transformer architecture where the encoder summarizes the most relevant information of every page and then, the decoder takes this summarized information to generate the final answer.

Question Answering Visual Question Answering

ICDAR 2019 Competition on Scene Text Visual Question Answering

no code implementations30 Jun 2019 Ali Furkan Biten, Rubèn Tito, Andres Mafla, Lluis Gomez, Marçal Rusiñol, Minesh Mathew, C. V. Jawahar, Ernest Valveny, Dimosthenis Karatzas

ST-VQA introduces an important aspect that is not addressed by any Visual Question Answering system up to date, namely the incorporation of scene text to answer questions asked about an image.

Question Answering Visual Question Answering

Multimodal grid features and cell pointers for Scene Text Visual Question Answering

no code implementations1 Jun 2020 Lluís Gómez, Ali Furkan Biten, Rubèn Tito, Andrés Mafla, Marçal Rusiñol, Ernest Valveny, Dimosthenis Karatzas

This paper presents a new model for the task of scene text visual question answering, in which questions about a given image can only be answered by reading and understanding scene text that is present in it.

Question Answering Visual Question Answering

Document Collection Visual Question Answering

no code implementations27 Apr 2021 Rubèn Tito, Dimosthenis Karatzas, Ernest Valveny

Current tasks and methods in Document Understanding aims to process documents as single elements.

document understanding Question Answering +1

Privacy-Aware Document Visual Question Answering

no code implementations15 Dec 2023 Rubèn Tito, Khanh Nguyen, Marlon Tobaben, Raouf Kerkouche, Mohamed Ali Souibgui, Kangsoo Jung, Lei Kang, Ernest Valveny, Antti Honkela, Mario Fritz, Dimosthenis Karatzas

We employ a federated learning scheme, that reflects the real-life distribution of documents in different businesses, and we explore the use case where the ID of the invoice issuer is the sensitive information to be protected.

document understanding Federated Learning +3

Cannot find the paper you are looking for? You can Submit a new open access paper.