1 code implementation • 3 Sep 2024 • Soumitri Chattopadhyay, Sanket Biswas, Emanuele Vivoli, Josep Lladós
Specifically, we propose two novel methods: Generative Class Prompt Learning (GCPL) and Contrastive Multi-class Prompt Learning (CoMPLe).
no code implementations • 27 Aug 2024 • Alloy Das, Sanket Biswas, Umapada Pal, Josep Lladós, Saumik Bhattacharya
The proliferation of scene text in both structured and unstructured environments presents significant challenges in optical character recognition (OCR), necessitating more efficient and robust text spotting solutions.
no code implementations • 12 Jun 2024 • Jordy Van Landeghem, Subhajit Maity, Ayan Banerjee, Matthew Blaschko, Marie-Francine Moens, Josep Lladós, Sanket Biswas
This work explores knowledge distillation (KD) for visually-rich document (VRD) applications such as document layout analysis (DLA) and document image classification (DIC).
no code implementations • 12 Jun 2024 • Sanket Biswas, Rajiv Jain, Vlad I. Morariu, Jiuxiang Gu, Puneet Mathur, Curtis Wigington, Tong Sun, Josep Lladós
While the generation of document layouts has been extensively explored, comprehensive document generation encompassing both layout and content presents a more complex challenge.
1 code implementation • 12 Jun 2024 • Maria Pilligua, Nil Biescas, Javier Vazquez-Corral, Josep Lladós, Ernest Valveny, Sanket Biswas
The rapid evolution of intelligent document processing systems demands robust solutions that adapt to diverse domains without extensive retraining.
no code implementations • 11 Jun 2024 • Adrià Molina, Oriol Ramos Terrades, Josep Lladós
This paper introduces Fetch-A-Set (FAS), a comprehensive benchmark tailored for legislative historical document analysis systems, addressing the challenges of large-scale document retrieval in historical contexts.
no code implementations • 6 May 2024 • Adarsh Tiwari, Sanket Biswas, Josep Lladós
We present SketchGPT, a flexible framework that employs a sequence-to-sequence autoregressive model for sketch generation, and completion, and an interpretation case study for sketch recognition.
1 code implementation • 6 May 2024 • Nil Biescas, Carlos Boned, Josep Lladós, Sanket Biswas
This paper presents GeoContrastNet, a language-agnostic framework to structured document understanding (DU) by integrating a contrastive learning objective with graph attention networks (GATs), emphasizing the significant role of geometric features.
1 code implementation • 30 Mar 2024 • Ayan Banerjee, Nityanand Mathur, Josep Lladós, Umapada Pal, Anjan Dutta
In response, this work introduces SVGCraft, a novel end-to-end framework for the creation of vector graphics depicting entire scenes from textual descriptions.
1 code implementation • 17 Feb 2024 • Ayan Banerjee, Sanket Biswas, Josep Lladós, Umapada Pal
Object detection in documents is a key step to automate the structural elements identification process in a digital or scanned document through understanding the hierarchical structure and relationships between different elements.
1 code implementation • 2 Oct 2023 • Alloy Das, Sanket Biswas, Ayan Banerjee, Josep Lladós, Umapada Pal, Saumik Bhattacharya
The adaptation capability to a wide range of domains is crucial for scene text spotting models when deployed to real-world conditions.
no code implementations • 1 Oct 2023 • Alloy Das, Sanket Biswas, Umapada Pal, Josep Lladós
When used in a real-world noisy environment, the capacity to generalize to multiple domains is essential for any autonomous scene text spotting system.
no code implementations • 11 Sep 2023 • Souhail Bakkali, Sanket Biswas, Zuheng Ming, Mickaël Coustaty, Marçal Rusiñol, Oriol Ramos Terrades, Josep Lladós
Visual document understanding (VDU) has rapidly advanced with the development of powerful multi-modal language models.
Ranked #20 on
Document Image Classification
on RVL-CDIP
no code implementations • 5 Aug 2023 • Alloy Das, Sanket Biswas, Prasun Roy, Subhankar Ghosh, Umapada Pal, Michael Blumenstein, Josep Lladós, Saumik Bhattacharya
Scene Text Editing (STE) is a challenging research problem, that primarily aims towards modifying existing texts in an image while preserving the background and the font style of the original text.
1 code implementation • 8 May 2023 • Ayan Banerjee, Sanket Biswas, Josep Lladós, Umapada Pal
Instance-level segmentation of documents consists in assigning a class-aware and instance-aware label to each pixel of the image.
1 code implementation • 1 May 2023 • Subhajit Maity, Sanket Biswas, Siladittya Manna, Ayan Banerjee, Josep Lladós, Saumik Bhattacharya, Umapada Pal
Document layout analysis is a known problem to the documents research community and has been vastly explored yielding a multitude of solutions ranging from text mining, and recognition to graph-based representation, visual feature extraction, etc.
no code implementations • 9 Dec 2022 • Asma Bensalah, Jialuo Chen, Alicia Fornés, Cristina Carmona-Duarte, Josep Lladós, Miguel A. Ferrer
Assessing the physical condition in rehabilitation scenarios is a challenging problem, since it involves Human Activity Recognition (HAR) and kinematic analysis methods.
no code implementations • 9 Dec 2022 • Alicia Fornés, Asma Bensalah, Cristina Carmona-Duarte, Jialuo Chen, Miguel A. Ferrer, Andreas Fischer, Josep Lladós, Cristina Martín, Eloy Opisso, Réjean Plamondon, Anna Scius-Bertrand, Josep Maria Tormos
This project explores the feasibility of remote patient monitoring based on the analysis of 3D movements captured with smartwatches.
no code implementations • 9 Dec 2022 • Asma Bensalah, Alicia Fornés, Cristina Carmona-Duarte, Josep Lladós
Assessing the quality of movements for post-stroke patients during the rehabilitation phase is vital given that there is no standard stroke rehabilitation plan for all the patients.
no code implementations • 21 Sep 2022 • Giuseppe De Gregorio, Sanket Biswas, Mohamed Ali Souibgui, Asma Bensalah, Josep Lladós, Alicia Fornés, Angelo Marcelli
Despite recent advances in automatic text recognition, the performance remains moderate when it comes to historical manuscripts.
1 code implementation • 23 Aug 2022 • Andrea Gemelli, Sanket Biswas, Enrico Civitelli, Josep Lladós, Simone Marinai
Geometric Deep Learning has recently attracted significant interest in a wide range of machine learning fields, including document analysis.
Ranked #7 on
Entity Linking
on FUNSD
no code implementations • 8 Apr 2022 • Adrià Molina, Lluis Gomez, Oriol Ramos Terrades, Josep Lladós
Date estimation of historical document images is a challenging problem, with several contributions in the literature that lack of the ability to generalize from one dataset to others.
1 code implementation • 9 Mar 2022 • Mohamed Ali Souibgui, Sanket Biswas, Andres Mafla, Ali Furkan Biten, Alicia Fornés, Yousri Kessentini, Josep Lladós, Lluis Gomez, Dimosthenis Karatzas
In this paper, we propose a Text-Degradation Invariant Auto Encoder (Text-DIAE), a self-supervised model designed to tackle two tasks, text recognition (handwritten or scene-text) and document image enhancement.
1 code implementation • 27 Jan 2022 • Sanket Biswas, Ayan Banerjee, Josep Lladós, Umapada Pal
has emerged as an interesting problem for the document analysis and understanding community.
1 code implementation • 25 Jan 2022 • Mohamed Ali Souibgui, Sanket Biswas, Sana Khamekhem Jemni, Yousri Kessentini, Alicia Fornés, Josep Lladós, Umapada Pal
Document images can be affected by many degradation scenarios, which cause recognition and processing difficulties.
Ranked #1 on
Binarization
on H-DIBCO 2011
no code implementations • 9 Jul 2021 • Sanket Biswas, Pau Riba, Josep Lladós, Umapada Pal
One of the major prerequisites for any deep learning approach is the availability of large-scale training data.
1 code implementation • 6 Jul 2021 • Sanket Biswas, Pau Riba, Josep Lladós, Umapada Pal
The results highlight that our model can successfully generate realistic and diverse document images with multiple objects.
1 code implementation • 10 Jun 2021 • Adrià Molina, Pau Riba, Lluis Gomez, Oriol Ramos-Terrades, Josep Lladós
This paper presents a novel method for date estimation of historical photographs from archival sources.
1 code implementation • 9 Jun 2021 • Pau Riba, Adrià Molina, Lluis Gomez, Oriol Ramos-Terrades, Josep Lladós
In this paper, we explore and evaluate the use of ranking-based objective functions for learning simultaneously a word string and a word image encoder.
no code implementations • 11 May 2021 • Mohamed Ali Souibgui, Ali Furkan Biten, Sounak Dey, Alicia Fornés, Yousri Kessentini, Lluis Gomez, Dimosthenis Karatzas, Josep Lladós
Low resource Handwritten Text Recognition (HTR) is a hard problem due to the scarce annotated data and the very limited linguistic information (dictionaries and language models).
no code implementations • 17 Aug 2020 • Pau Riba, Andreas Fischer, Josep Lladós, Alicia Fornés
The emergence of geometric deep learning as a novel framework to deal with graph-based representations has faded away traditional approaches in favor of completely new methodologies.
2 code implementations • 20 Dec 2019 • Manuel Carbonell, Alicia Fornés, Mauricio Villegas, Josep Lladós
In this work we propose an end-to-end model that combines a one stage object detection network with branches for the recognition of text and named entities respectively in a way that shared features can be learned simultaneously from the training error of each of the tasks.
1 code implementation • 8 Jul 2018 • Anjan Dutta, Pau Riba, Josep Lladós, Alicia Fornés
Graph embedding, which maps graphs to a vectorial space, has been proposed as a way to tackle these difficulties enabling the use of standard machine learning techniques.
no code implementations • 28 Apr 2018 • Sounak Dey, Anjan Dutta, Suman K. Ghosh, Ernest Valveny, Josep Lladós, Umapada Pal
In this work we introduce a cross modal image retrieval system that allows both text and sketch as input modalities for the query.
no code implementations • 16 Mar 2018 • Manuel Carbonell, Mauricio Villegas, Alicia Fornés, Josep Lladós
When extracting information from handwritten documents, text transcription and named entity recognition are usually faced as separate subsequent tasks.
no code implementations • 21 Aug 2017 • Albert Berenguel, Oriol Ramos Terrades, Josep Lladós, Cristina Cañero
This paper presents a novel application to detect counterfeit identity documents forged by a scan-printing operation.
no code implementations • 1 Feb 2017 • Anjan Dutta, Josep Lladós, Horst Bunke, Umapada Pal
Many algorithms formulate graph matching as an optimization of an objective function of pairwise quantification of nodes and edges of two graphs to be matched.