Search Results for author: Ernest Valveny

Found 17 papers, 4 papers with code

Privacy-Aware Document Visual Question Answering

no code implementations15 Dec 2023 Rubèn Tito, Khanh Nguyen, Marlon Tobaben, Raouf Kerkouche, Mohamed Ali Souibgui, Kangsoo Jung, Lei Kang, Ernest Valveny, Antti Honkela, Mario Fritz, Dimosthenis Karatzas

We employ a federated learning scheme, that reflects the real-life distribution of documents in different businesses, and we explore the use case where the ID of the invoice issuer is the sensitive information to be protected.

document understanding Federated Learning +3

Hierarchical multimodal transformers for Multi-Page DocVQA

1 code implementation7 Dec 2022 Rubèn Tito, Dimosthenis Karatzas, Ernest Valveny

The proposed method is based on a hierarchical transformer architecture where the encoder summarizes the most relevant information of every page and then, the decoder takes this summarized information to generate the final answer.

Question Answering Visual Question Answering

EKTVQA: Generalized use of External Knowledge to empower Scene Text in Text-VQA

no code implementations22 Aug 2021 Arka Ujjal Dey, Ernest Valveny, Gaurav Harit

The open-ended question answering task of Text-VQA often requires reading and reasoning about rarely seen or completely unseen scene-text content of an image.

Open-Ended Question Answering Optical Character Recognition (OCR) +1

Document Collection Visual Question Answering

no code implementations27 Apr 2021 Rubèn Tito, Dimosthenis Karatzas, Ernest Valveny

Current tasks and methods in Document Understanding aims to process documents as single elements.

document understanding Question Answering +1

InfographicVQA

no code implementations26 Apr 2021 Minesh Mathew, Viraj Bagal, Rubèn Pérez Tito, Dimosthenis Karatzas, Ernest Valveny, C. V Jawahar

Infographics are documents designed to effectively communicate information using a combination of textual, graphical and visual elements.

Question Answering Visual Question Answering

Multimodal grid features and cell pointers for Scene Text Visual Question Answering

no code implementations1 Jun 2020 Lluís Gómez, Ali Furkan Biten, Rubèn Tito, Andrés Mafla, Marçal Rusiñol, Ernest Valveny, Dimosthenis Karatzas

This paper presents a new model for the task of scene text visual question answering, in which questions about a given image can only be answered by reading and understanding scene text that is present in it.

Question Answering Visual Question Answering

ICDAR 2019 Competition on Scene Text Visual Question Answering

no code implementations30 Jun 2019 Ali Furkan Biten, Rubèn Tito, Andres Mafla, Lluis Gomez, Marçal Rusiñol, Minesh Mathew, C. V. Jawahar, Ernest Valveny, Dimosthenis Karatzas

ST-VQA introduces an important aspect that is not addressed by any Visual Question Answering system up to date, namely the incorporation of scene text to answer questions asked about an image.

Question Answering Visual Question Answering

Don't only Feel Read: Using Scene text to understand advertisements

no code implementations21 Jun 2018 Arka Ujjal Dey, Suman K. Ghosh, Ernest Valveny

We propose a framework for automated classification of Advertisement Images, using not just Visual features but also Textual cues extracted from embedded text.

General Classification

Learning Cross-Modal Deep Embeddings for Multi-Object Image Retrieval using Text and Sketch

no code implementations28 Apr 2018 Sounak Dey, Anjan Dutta, Suman K. Ghosh, Ernest Valveny, Josep Lladós, Umapada Pal

In this work we introduce a cross modal image retrieval system that allows both text and sketch as input modalities for the query.

Image Retrieval Retrieval

R-PHOC: Segmentation-Free Word Spotting using CNN

no code implementations5 Jul 2017 Suman Ghosh, Ernest Valveny

This paper proposes a region based convolutional neural network for segmentation-free word spotting.

Segmentation

Visual attention models for scene text recognition

no code implementations5 Jun 2017 Suman K. Ghosh, Ernest Valveny, Andrew D. Bagdanov

A set of feature vectors are derived from an intermediate convolutional layer corresponding to different areas of the image.

Language Modelling Scene Text Recognition

Query by String word spotting based on character bi-gram indexing

no code implementations28 May 2015 Suman K. Ghosh, Ernest Valveny

Both the documents and query strings are encoded using a recently proposed word representa- tion that projects images and strings into a common atribute space based on a pyramidal histogram of characters(PHOC).

Attribute Re-Ranking +2

Cannot find the paper you are looking for? You can Submit a new open access paper.