no code implementations • 11 Apr 2025 • Lei Kang, Xuanshuo Fu, Lluis Gomez, Alicia Fornés, Ernest Valveny, Dimosthenis Karatzas
Our proposed method utilizes a writer classification head both as an indicator and a trigger for unlearning, while maintaining the efficacy of the recognition head.
no code implementations • 6 Nov 2024 • Marlon Tobaben, Mohamed Ali Souibgui, Rubèn Tito, Khanh Nguyen, Raouf Kerkouche, Kangsoo Jung, Joonas Jälkö, Lei Kang, Andrey Barsky, Vincent Poulain D'Andecy, Aurélie Joseph, Aashiq Muhamed, Kevin Kuo, Virginia Smith, Yusuke Yamasaki, Takumi Fukami, Kenta Niwa, Iifan Tyou, Hiro Ishii, Rio Yokota, Ragul N, Rintu Kutum, Josep Llados, Ernest Valveny, Antti Honkela, Mario Fritz, Dimosthenis Karatzas
The Privacy Preserving Federated Learning Document VQA (PFL-DocVQA) competition challenged the community to develop provably private and communication-efficient solutions in a federated setting for a real-life use case: invoice processing.
1 code implementation • 14 Aug 2024 • Lei Kang, Fei Yang, Kai Wang, Mohamed Ali Souibgui, Lluis Gomez, Alicia Fornés, Ernest Valveny, Dimosthenis Karatzas
In this paper, we introduce a diffusion-based method, termed \ourmethod, to generate fonts that vividly embody specific impressions, utilizing an input consisting of a single letter and a set of descriptive impression keywords.
1 code implementation • 29 Jul 2024 • Artemis Llabrés, Arka Ujjal Dey, Dimosthenis Karatzas, Ernest Valveny
We show that both the Hungarian Matching and the proposed BERT-based model outperform a fuzzy string matching baseline, and we highlight inherent limitations of the matching algorithms as the target increases in size, and when either of the two sets (detected books or target book list) is incomplete.
1 code implementation • 12 Jun 2024 • Maria Pilligua, Nil Biescas, Javier Vazquez-Corral, Josep Lladós, Ernest Valveny, Sanket Biswas
The rapid evolution of intelligent document processing systems demands robust solutions that adapt to diverse domains without extensive retraining.
1 code implementation • 29 Apr 2024 • Lei Kang, Rubèn Tito, Ernest Valveny, Dimosthenis Karatzas
In particular, we employ a visual-only document representation, leveraging the encoder from a document understanding model, Pix2Struct.
1 code implementation • 29 Apr 2024 • Lei Kang, Mohamed Ali Souibgui, Fei Yang, Lluis Gomez, Ernest Valveny, Dimosthenis Karatzas
In our research, we explore machine unlearning for document classification problems, representing, to the best of our knowledge, the first investigation into this area.
1 code implementation • 15 Dec 2023 • Rubèn Tito, Khanh Nguyen, Marlon Tobaben, Raouf Kerkouche, Mohamed Ali Souibgui, Kangsoo Jung, Joonas Jälkö, Vincent Poulain D'Andecy, Aurelie Joseph, Lei Kang, Ernest Valveny, Antti Honkela, Mario Fritz, Dimosthenis Karatzas
We employ a federated learning scheme, that reflects the real-life distribution of documents in different businesses, and we explore the use case where the data of the invoice provider is the sensitive information to be protected.
1 code implementation • ICCV 2023 • Jordy Van Landeghem, Rubén Tito, Łukasz Borchmann, Michał Pietruszka, Paweł Józiak, Rafał Powalski, Dawid Jurkiewicz, Mickaël Coustaty, Bertrand Ackaert, Ernest Valveny, Matthew Blaschko, Sien Moens, Tomasz Stanisławek
We call on the Document AI (DocAI) community to reevaluate current methodologies and embrace the challenge of creating more practically-oriented benchmarks.
1 code implementation • 7 Dec 2022 • Rubèn Tito, Dimosthenis Karatzas, Ernest Valveny
The proposed method is based on a hierarchical transformer architecture where the encoder summarizes the most relevant information of every page and then, the decoder takes this summarized information to generate the final answer.
1 code implementation • 25 Feb 2022 • Ali Furkan Biten, Rubèn Tito, Lluis Gomez, Ernest Valveny, Dimosthenis Karatzas
It is our hope that OCR-IDL can be a starting point for future works on Document Intelligence.
no code implementations • 10 Nov 2021 • Rubèn Tito, Minesh Mathew, C. V. Jawahar, Ernest Valveny, Dimosthenis Karatzas
In this report we present results of the ICDAR 2021 edition of the Document Visual Question Challenges.
no code implementations • 22 Aug 2021 • Arka Ujjal Dey, Ernest Valveny, Gaurav Harit
The open-ended question answering task of Text-VQA often requires reading and reasoning about rarely seen or completely unseen scene-text content of an image.
Open-Ended Question Answering
Optical Character Recognition (OCR)
+1
no code implementations • 27 Apr 2021 • Rubèn Tito, Dimosthenis Karatzas, Ernest Valveny
Current tasks and methods in Document Understanding aims to process documents as single elements.
no code implementations • 26 Apr 2021 • Minesh Mathew, Viraj Bagal, Rubèn Pérez Tito, Dimosthenis Karatzas, Ernest Valveny, C. V Jawahar
Infographics are documents designed to effectively communicate information using a combination of textual, graphical and visual elements.
no code implementations • 1 Jun 2020 • Lluís Gómez, Ali Furkan Biten, Rubèn Tito, Andrés Mafla, Marçal Rusiñol, Ernest Valveny, Dimosthenis Karatzas
This paper presents a new model for the task of scene text visual question answering, in which questions about a given image can only be answered by reading and understanding scene text that is present in it.
no code implementations • 30 Jun 2019 • Ali Furkan Biten, Rubèn Tito, Andres Mafla, Lluis Gomez, Marçal Rusiñol, Minesh Mathew, C. V. Jawahar, Ernest Valveny, Dimosthenis Karatzas
ST-VQA introduces an important aspect that is not addressed by any Visual Question Answering system up to date, namely the incorporation of scene text to answer questions asked about an image.
4 code implementations • ICCV 2019 • Ali Furkan Biten, Ruben Tito, Andres Mafla, Lluis Gomez, Marçal Rusiñol, Ernest Valveny, C. V. Jawahar, Dimosthenis Karatzas
Current visual question answering datasets do not consider the rich semantic information conveyed by text within an image.
no code implementations • 25 May 2019 • Arka Ujjal Dey, Suman Kumar Ghosh, Ernest Valveny, Gaurav Harit
Images with visual and scene text content are ubiquitous in everyday life.
no code implementations • 21 Jun 2018 • Arka Ujjal Dey, Suman K. Ghosh, Ernest Valveny
We propose a framework for automated classification of Advertisement Images, using not just Visual features but also Textual cues extracted from embedded text.
no code implementations • 28 Apr 2018 • Sounak Dey, Anjan Dutta, Suman K. Ghosh, Ernest Valveny, Josep Lladós, Umapada Pal
In this work we introduce a cross modal image retrieval system that allows both text and sketch as input modalities for the query.
no code implementations • 5 Jul 2017 • Suman Ghosh, Ernest Valveny
This paper proposes a region based convolutional neural network for segmentation-free word spotting.
no code implementations • 5 Jun 2017 • Suman K. Ghosh, Ernest Valveny, Andrew D. Bagdanov
A set of feature vectors are derived from an intermediate convolutional layer corresponding to different areas of the image.
no code implementations • 28 May 2015 • Suman K. Ghosh, Ernest Valveny
Both the documents and query strings are encoded using a recently proposed word representa- tion that projects images and strings into a common atribute space based on a pyramidal histogram of characters(PHOC).