1 code implementation • 24 Sep 2024 • Emanuele Vivoli, Niccolò Biondi, Marco Bertini, Dimosthenis Karatzas
The comic domain is rapidly advancing with the development of single- and multi-page analysis and synthesis models.
1 code implementation • 14 Sep 2024 • Emanuele Vivoli, Andrey Barsky, Mohamed Ali Souibgui, Artemis LLabres, Marco Bertini, Dimosthenis Karatzas
Our contributions are fivefold: (1) We analyze the structure of the comics medium, detailing its distinctive compositional elements; (2) We survey the widely used datasets and tasks in comics research, emphasizing their role in advancing the field; (3) We introduce the Layer of Comics Understanding (LoCU) framework, a novel taxonomy that redefines vision-language tasks within comics and lays the foundation for future work; (4) We provide a detailed review and categorization of existing methods following the LoCU framework; (5) Finally, we highlight current research challenges and propose directions for future exploration, particularly in the context of vision-language models applied to comics.
1 code implementation • 3 Sep 2024 • Soumitri Chattopadhyay, Sanket Biswas, Emanuele Vivoli, Josep Lladós
Specifically, we propose two novel methods: Generative Class Prompt Learning (GCPL) and Contrastive Multi-class Prompt Learning (CoMPLe).
1 code implementation • 4 Jul 2024 • Emanuele Vivoli, Marco Bertini, Dimosthenis Karatzas
We introduce a novel benchmark, CoMix, designed to evaluate the multi-task capabilities of models in comic analysis.
no code implementations • 3 Jul 2024 • Emanuele Vivoli, Irene Campaioli, Mariateresa Nardoni, Niccolò Biondi, Marco Bertini, Dimosthenis Karatzas
We have benchmarked a variety of detection architectures using the Comics Datasets Framework.
no code implementations • 6 Mar 2024 • Emanuele Vivoli, Joan Lafuente Baeza, Ernest Valveny Llobet, Dimosthenis Karatzas
This work explores a closure task in comics, a medium where visual and textual elements are intricately intertwined.
no code implementations • 27 Mar 2023 • Emanuele Vivoli, Luca Bossi, Marco Bertini, Pierluigi Falorni, Lorenzo Capineri
Holographic imaging is a technique that uses microwave energy to create a three-dimensional image of an object or scene.
1 code implementation • 2 Feb 2023 • Andrea Gemelli, Emanuele Vivoli, Simone Marinai
We define the task of Contextualized Table Extraction (CTE), which aims to extract and define the structure of tables considering the textual context of the document.
no code implementations • 14 Sep 2022 • Emanuele Vivoli, Ali Furkan Biten, Andres Mafla, Dimosthenis Karatzas, Lluis Gomez
In this paper, we present a framework for Multilingual Scene Text Visual Question Answering that deals with new languages in a zero-shot fashion.
1 code implementation • 23 Aug 2022 • Andrea Gemelli, Emanuele Vivoli, Simone Marinai
Tables are widely used in several types of documents since they can bring important information in a structured way.