Search Results for author: Ernest Valveny

Found 24 papers, 10 papers with code

Preserving Privacy Without Compromising Accuracy: Machine Unlearning for Handwritten Text Recognition

no code implementations11 Apr 2025 Lei Kang, Xuanshuo Fu, Lluis Gomez, Alicia Fornés, Ernest Valveny, Dimosthenis Karatzas

Our proposed method utilizes a writer classification head both as an indicator and a trigger for unlearning, while maintaining the efficacy of the recognition head.

Handwritten Text Recognition HTR +1

GRIF-DM: Generation of Rich Impression Fonts using Diffusion Models

1 code implementation14 Aug 2024 Lei Kang, Fei Yang, Kai Wang, Mohamed Ali Souibgui, Lluis Gomez, Alicia Fornés, Ernest Valveny, Dimosthenis Karatzas

In this paper, we introduce a diffusion-based method, termed \ourmethod, to generate fonts that vividly embody specific impressions, utilizing an input consisting of a single letter and a set of descriptive impression keywords.

Descriptive Font Generation

Image-text matching for large-scale book collections

1 code implementation29 Jul 2024 Artemis Llabrés, Arka Ujjal Dey, Dimosthenis Karatzas, Ernest Valveny

We show that both the Hungarian Matching and the proposed BERT-based model outperform a fuzzy string matching baseline, and we highlight inherent limitations of the matching algorithms as the target increases in size, and when either of the two sets (detected books or target book list) is incomplete.

Image-text matching Optical Character Recognition (OCR) +1

LayeredDoc: Domain Adaptive Document Restoration with a Layer Separation Approach

1 code implementation12 Jun 2024 Maria Pilligua, Nil Biescas, Javier Vazquez-Corral, Josep Lladós, Ernest Valveny, Sanket Biswas

The rapid evolution of intelligent document processing systems demands robust solutions that adapt to diverse domains without extensive retraining.

Domain Adaptation Image Restoration

Multi-Page Document Visual Question Answering using Self-Attention Scoring Mechanism

1 code implementation29 Apr 2024 Lei Kang, Rubèn Tito, Ernest Valveny, Dimosthenis Karatzas

In particular, we employ a visual-only document representation, leveraging the encoder from a document understanding model, Pix2Struct.

document understanding Optical Character Recognition +3

Machine Unlearning for Document Classification

1 code implementation29 Apr 2024 Lei Kang, Mohamed Ali Souibgui, Fei Yang, Lluis Gomez, Ernest Valveny, Dimosthenis Karatzas

In our research, we explore machine unlearning for document classification problems, representing, to the best of our knowledge, the first investigation into this area.

Classification Document Classification +2

Privacy-Aware Document Visual Question Answering

1 code implementation15 Dec 2023 Rubèn Tito, Khanh Nguyen, Marlon Tobaben, Raouf Kerkouche, Mohamed Ali Souibgui, Kangsoo Jung, Joonas Jälkö, Vincent Poulain D'Andecy, Aurelie Joseph, Lei Kang, Ernest Valveny, Antti Honkela, Mario Fritz, Dimosthenis Karatzas

We employ a federated learning scheme, that reflects the real-life distribution of documents in different businesses, and we explore the use case where the data of the invoice provider is the sensitive information to be protected.

document understanding Federated Learning +3

Hierarchical multimodal transformers for Multi-Page DocVQA

1 code implementation7 Dec 2022 Rubèn Tito, Dimosthenis Karatzas, Ernest Valveny

The proposed method is based on a hierarchical transformer architecture where the encoder summarizes the most relevant information of every page and then, the decoder takes this summarized information to generate the final answer.

Decoder Question Answering +1

EKTVQA: Generalized use of External Knowledge to empower Scene Text in Text-VQA

no code implementations22 Aug 2021 Arka Ujjal Dey, Ernest Valveny, Gaurav Harit

The open-ended question answering task of Text-VQA often requires reading and reasoning about rarely seen or completely unseen scene-text content of an image.

Open-Ended Question Answering Optical Character Recognition (OCR) +1

Document Collection Visual Question Answering

no code implementations27 Apr 2021 Rubèn Tito, Dimosthenis Karatzas, Ernest Valveny

Current tasks and methods in Document Understanding aims to process documents as single elements.

document understanding Question Answering +1

InfographicVQA

no code implementations26 Apr 2021 Minesh Mathew, Viraj Bagal, Rubèn Pérez Tito, Dimosthenis Karatzas, Ernest Valveny, C. V Jawahar

Infographics are documents designed to effectively communicate information using a combination of textual, graphical and visual elements.

Question Answering Visual Question Answering

Multimodal grid features and cell pointers for Scene Text Visual Question Answering

no code implementations1 Jun 2020 Lluís Gómez, Ali Furkan Biten, Rubèn Tito, Andrés Mafla, Marçal Rusiñol, Ernest Valveny, Dimosthenis Karatzas

This paper presents a new model for the task of scene text visual question answering, in which questions about a given image can only be answered by reading and understanding scene text that is present in it.

Question Answering Visual Question Answering

ICDAR 2019 Competition on Scene Text Visual Question Answering

no code implementations30 Jun 2019 Ali Furkan Biten, Rubèn Tito, Andres Mafla, Lluis Gomez, Marçal Rusiñol, Minesh Mathew, C. V. Jawahar, Ernest Valveny, Dimosthenis Karatzas

ST-VQA introduces an important aspect that is not addressed by any Visual Question Answering system up to date, namely the incorporation of scene text to answer questions asked about an image.

Question Answering Visual Question Answering

Don't only Feel Read: Using Scene text to understand advertisements

no code implementations21 Jun 2018 Arka Ujjal Dey, Suman K. Ghosh, Ernest Valveny

We propose a framework for automated classification of Advertisement Images, using not just Visual features but also Textual cues extracted from embedded text.

General Classification

Learning Cross-Modal Deep Embeddings for Multi-Object Image Retrieval using Text and Sketch

no code implementations28 Apr 2018 Sounak Dey, Anjan Dutta, Suman K. Ghosh, Ernest Valveny, Josep Lladós, Umapada Pal

In this work we introduce a cross modal image retrieval system that allows both text and sketch as input modalities for the query.

Image Retrieval Retrieval

R-PHOC: Segmentation-Free Word Spotting using CNN

no code implementations5 Jul 2017 Suman Ghosh, Ernest Valveny

This paper proposes a region based convolutional neural network for segmentation-free word spotting.

Segmentation

Visual attention models for scene text recognition

no code implementations5 Jun 2017 Suman K. Ghosh, Ernest Valveny, Andrew D. Bagdanov

A set of feature vectors are derived from an intermediate convolutional layer corresponding to different areas of the image.

Language Modeling Language Modelling +1

Query by String word spotting based on character bi-gram indexing

no code implementations28 May 2015 Suman K. Ghosh, Ernest Valveny

Both the documents and query strings are encoded using a recently proposed word representa- tion that projects images and strings into a common atribute space based on a pyramidal histogram of characters(PHOC).

Attribute Re-Ranking +2

Cannot find the paper you are looking for? You can Submit a new open access paper.