Search Results for author: Ernest Valveny

Found 17 papers, 4 papers with code

Privacy-Aware Document Visual Question Answering

no code implementations • 15 Dec 2023 • Rubèn Tito, Khanh Nguyen, Marlon Tobaben, Raouf Kerkouche, Mohamed Ali Souibgui, Kangsoo Jung, Lei Kang, Ernest Valveny, Antti Honkela, Mario Fritz, Dimosthenis Karatzas

We employ a federated learning scheme, that reflects the real-life distribution of documents in different businesses, and we explore the use case where the ID of the invoice issuer is the sensitive information to be protected.

document understanding Federated Learning +3

Paper
Add Code

Document Understanding Dataset and Evaluation (DUDE)

1 code implementation • ICCV 2023 • Jordy Van Landeghem, Rubén Tito, Łukasz Borchmann, Michał Pietruszka, Paweł Józiak, Rafał Powalski, Dawid Jurkiewicz, Mickaël Coustaty, Bertrand Ackaert, Ernest Valveny, Matthew Blaschko, Sien Moens, Tomasz Stanisławek

We call on the Document AI (DocAI) community to reevaluate current methodologies and embrace the challenge of creating more practically-oriented benchmarks.

document understanding

Paper
Code

Hierarchical multimodal transformers for Multi-Page DocVQA

1 code implementation • 7 Dec 2022 • Rubèn Tito, Dimosthenis Karatzas, Ernest Valveny

The proposed method is based on a hierarchical transformer architecture where the encoder summarizes the most relevant information of every page and then, the decoder takes this summarized information to generate the final answer.

Question Answering Visual Question Answering

Paper
Code

OCR-IDL: OCR Annotations for Industry Document Library Dataset

1 code implementation • 25 Feb 2022 • Ali Furkan Biten, Rubèn Tito, Lluis Gomez, Ernest Valveny, Dimosthenis Karatzas

It is our hope that OCR-IDL can be a starting point for future works on Document Intelligence.

Optical Character Recognition (OCR)

Paper
Code

ICDAR 2021 Competition on Document VisualQuestion Answering

no code implementations • 10 Nov 2021 • Rubèn Tito, Minesh Mathew, C. V. Jawahar, Ernest Valveny, Dimosthenis Karatzas

In this report we present results of the ICDAR 2021 edition of the Document Visual Question Challenges.

Visual Question Answering (VQA)

Paper
Add Code

EKTVQA: Generalized use of External Knowledge to empower Scene Text in Text-VQA

no code implementations • 22 Aug 2021 • Arka Ujjal Dey, Ernest Valveny, Gaurav Harit

The open-ended question answering task of Text-VQA often requires reading and reasoning about rarely seen or completely unseen scene-text content of an image.

Open-Ended Question Answering Optical Character Recognition (OCR) +1

Paper
Add Code

Document Collection Visual Question Answering

no code implementations • 27 Apr 2021 • Rubèn Tito, Dimosthenis Karatzas, Ernest Valveny

Current tasks and methods in Document Understanding aims to process documents as single elements.

document understanding Question Answering +1

Paper
Add Code

InfographicVQA

no code implementations • 26 Apr 2021 • Minesh Mathew, Viraj Bagal, Rubèn Pérez Tito, Dimosthenis Karatzas, Ernest Valveny, C. V Jawahar

Infographics are documents designed to effectively communicate information using a combination of textual, graphical and visual elements.

Question Answering Visual Question Answering

Paper
Add Code

Multimodal grid features and cell pointers for Scene Text Visual Question Answering

no code implementations • 1 Jun 2020 • Lluís Gómez, Ali Furkan Biten, Rubèn Tito, Andrés Mafla, Marçal Rusiñol, Ernest Valveny, Dimosthenis Karatzas

This paper presents a new model for the task of scene text visual question answering, in which questions about a given image can only be answered by reading and understanding scene text that is present in it.

Question Answering Visual Question Answering

Paper
Add Code

ICDAR 2019 Competition on Scene Text Visual Question Answering

no code implementations • 30 Jun 2019 • Ali Furkan Biten, Rubèn Tito, Andres Mafla, Lluis Gomez, Marçal Rusiñol, Minesh Mathew, C. V. Jawahar, Ernest Valveny, Dimosthenis Karatzas

ST-VQA introduces an important aspect that is not addressed by any Visual Question Answering system up to date, namely the incorporation of scene text to answer questions asked about an image.

Question Answering Visual Question Answering

Paper
Add Code

Scene Text Visual Question Answering

3 code implementations • ICCV 2019 • Ali Furkan Biten, Ruben Tito, Andres Mafla, Lluis Gomez, Marçal Rusiñol, Ernest Valveny, C. V. Jawahar, Dimosthenis Karatzas

Current visual question answering datasets do not consider the rich semantic information conveyed by text within an image.

Question Answering Visual Question Answering

Paper
Code

Beyond Visual Semantics: Exploring the Role of Scene Text in Image Understanding

no code implementations • 25 May 2019 • Arka Ujjal Dey, Suman Kumar Ghosh, Ernest Valveny, Gaurav Harit

Images with visual and scene text content are ubiquitous in everyday life.

Retrieval

Paper
Add Code

Don't only Feel Read: Using Scene text to understand advertisements

no code implementations • 21 Jun 2018 • Arka Ujjal Dey, Suman K. Ghosh, Ernest Valveny

We propose a framework for automated classification of Advertisement Images, using not just Visual features but also Textual cues extracted from embedded text.

General Classification

Paper
Add Code

Learning Cross-Modal Deep Embeddings for Multi-Object Image Retrieval using Text and Sketch

no code implementations • 28 Apr 2018 • Sounak Dey, Anjan Dutta, Suman K. Ghosh, Ernest Valveny, Josep Lladós, Umapada Pal

In this work we introduce a cross modal image retrieval system that allows both text and sketch as input modalities for the query.

Image Retrieval Retrieval

Paper
Add Code

R-PHOC: Segmentation-Free Word Spotting using CNN

no code implementations • 5 Jul 2017 • Suman Ghosh, Ernest Valveny

This paper proposes a region based convolutional neural network for segmentation-free word spotting.

Segmentation

Paper
Add Code

Visual attention models for scene text recognition

no code implementations • 5 Jun 2017 • Suman K. Ghosh, Ernest Valveny, Andrew D. Bagdanov

A set of feature vectors are derived from an intermediate convolutional layer corresponding to different areas of the image.

Language Modelling Scene Text Recognition

Paper
Add Code

Query by String word spotting based on character bi-gram indexing

no code implementations • 28 May 2015 • Suman K. Ghosh, Ernest Valveny

Both the documents and query strings are encoded using a recently proposed word representa- tion that projects images and strings into a common atribute space based on a pyramidal histogram of characters(PHOC).

Attribute Re-Ranking +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.