no code implementations • 28 Jul 2014 • Lluis Gomez, Dimosthenis Karatzas
Typography and layout lead to the hierarchical organisation of text in words, text lines, paragraphs.
1 code implementation • 8 Sep 2015 • Lluis Gomez, Dimosthenis Karatzas
The use of Object Proposals techniques in the scene text understanding field is innovative.
1 code implementation • 24 Feb 2016 • Lluis Gomez, Anguelos Nicolaou, Dimosthenis Karatzas
Instead of resizing input images to a fixed aspect ratio as in the typical use of holistic CNN classifiers, we propose here a patch-based classification framework in order to preserve discriminative parts of the image that are characteristic of its class.
no code implementations • 24 Feb 2016 • Lluis Gomez, Dimosthenis Karatzas
Although widely studied for document images and handwritten documents, it remains an almost unexplored territory for scene text images.
1 code implementation • 16 Feb 2017 • Dena Bazazian, Raul Gomez, Anguelos Nicolaou, Lluis Gomez, Dimosthenis Karatzas, Andrew D. Bagdanov
Text Proposals have emerged as a class-dependent version of object proposals - efficient approaches to reduce the search space of possible text object locations in an image.
no code implementations • CVPR 2017 • Lluis Gomez, Yash Patel, Marçal Rusiñol, Dimosthenis Karatzas, C. V. Jawahar
End-to-end training from scratch of current deep architectures for new computer vision problems would require Imagenet-scale datasets, and this is not always possible.
1 code implementation • 4 Jul 2018 • Yash Patel, Lluis Gomez, Raul Gomez, Marçal Rusiñol, Dimosthenis Karatzas, C. V. Jawahar
We show that adequate visual features can be learned efficiently by training a CNN to predict the semantic textual context in which a particular image is more probable to appear as an illustration.
1 code implementation • 20 Aug 2018 • Raul Gomez, Lluis Gomez, Jaume Gibert, Dimosthenis Karatzas
In this paper we propose to learn a multimodal image and text embedding from Web and Social Media data, aiming to leverage the semantic knowledge learnt in the text domain and transfer it to a visual model for semantic image retrieval.
1 code implementation • 20 Aug 2018 • Raul Gomez, Lluis Gomez, Jaume Gibert, Dimosthenis Karatzas
We perform a language separate treatment of the data and show that it can be extrapolated to a tourists and locals separate analysis, and that tourism is reflected in Social Media at a neighborhood level.
1 code implementation • 7 Jan 2019 • Raul Gomez, Lluis Gomez, Jaume Gibert, Dimosthenis Karatzas
In this work we propose to exploit this free available data to learn a multimodal image and text embedding, aiming to leverage the semantic knowledge learnt in the text domain and transfer it to a visual model for semantic image retrieval.
no code implementations • 31 Jan 2019 • Yash Patel, Lluis Gomez, Marçal Rusiñol, Dimosthenis Karatzas, C. V. Jawahar
Cross-modal retrieval methods have been significantly improved in last years with the use of deep neural networks and large-scale annotated datasets such as ImageNet and Places.
1 code implementation • CVPR 2019 • Ali Furkan Biten, Lluis Gomez, Marçal Rusiñol, Dimosthenis Karatzas
We propose a novel captioning method that is able to leverage contextual information provided by the text of news articles associated with an image.
3 code implementations • ICCV 2019 • Ali Furkan Biten, Ruben Tito, Andres Mafla, Lluis Gomez, Marçal Rusiñol, Ernest Valveny, C. V. Jawahar, Dimosthenis Karatzas
Current visual question answering datasets do not consider the rich semantic information conveyed by text within an image.
1 code implementation • 4 Jun 2019 • Raul Gomez, Ali Furkan Biten, Lluis Gomez, Jaume Gibert, Marçal Rusiñol, Dimosthenis Karatzas
This paper explores the possibilities of image style transfer applied to text maintaining the original transcriptions.
no code implementations • 30 Jun 2019 • Ali Furkan Biten, Rubèn Tito, Andres Mafla, Lluis Gomez, Marçal Rusiñol, Minesh Mathew, C. V. Jawahar, Ernest Valveny, Dimosthenis Karatzas
ST-VQA introduces an important aspect that is not addressed by any Visual Question Answering system up to date, namely the incorporation of scene text to answer questions asked about an image.
1 code implementation • 9 Oct 2019 • Raul Gomez, Jaume Gibert, Lluis Gomez, Dimosthenis Karatzas
In this work we target the problem of hate speech detection in multimodal publications formed by a text and an image.
2 code implementations • 14 Jan 2020 • Andres Mafla, Sounak Dey, Ali Furkan Biten, Lluis Gomez, Dimosthenis Karatzas
Text contained in an image carries high-level semantics that can be exploited to achieve richer image understanding.
Ranked #1 on Fine-Grained Image Classification on Con-Text
no code implementations • 19 May 2020 • Sangeeth Reddy, Minesh Mathew, Lluis Gomez, Marcal Rusinol, Dimosthenis Karatzas., C. V. Jawahar
State of the art methods for text detection, recognition and tracking are evaluated on the new dataset and the results signify the challenges in unconstrained driving videos compared to existing datasets.
no code implementations • 6 Jul 2020 • Klára Janoušková, Jiri Matas, Lluis Gomez, Dimosthenis Karatzas
We present a method for exploiting weakly annotated images to improve text extraction pipelines.
no code implementations • ECCV 2020 • Raul Gomez, Jaume Gibert, Lluis Gomez, Dimosthenis Karatzas
People from different parts of the globe describe objects and concepts in distinct manners.
1 code implementation • 21 Sep 2020 • Andres Mafla, Sounak Dey, Ali Furkan Biten, Lluis Gomez, Dimosthenis Karatzas
Scene text instances found in natural images carry explicit semantic information that can provide important cues to solve a wide array of computer vision problems.
no code implementations • 11 May 2021 • Mohamed Ali Souibgui, Ali Furkan Biten, Sounak Dey, Alicia Fornés, Yousri Kessentini, Lluis Gomez, Dimosthenis Karatzas, Josep Lladós
Low resource Handwritten Text Recognition (HTR) is a hard problem due to the scarce annotated data and the very limited linguistic information (dictionaries and language models).
1 code implementation • 9 Jun 2021 • Pau Riba, Adrià Molina, Lluis Gomez, Oriol Ramos-Terrades, Josep Lladós
In this paper, we explore and evaluate the use of ranking-based objective functions for learning simultaneously a word string and a word image encoder.
1 code implementation • 10 Jun 2021 • Adrià Molina, Pau Riba, Lluis Gomez, Oriol Ramos-Terrades, Josep Lladós
This paper presents a novel method for date estimation of historical photographs from archival sources.
no code implementations • 2 Oct 2021 • Minesh Mathew, Lluis Gomez, Dimosthenis Karatzas, CV Jawahar
This work addresses the problem of Question Answering (QA) on handwritten document collections.
1 code implementation • 4 Oct 2021 • Ali Furkan Biten, Lluis Gomez, Dimosthenis Karatzas
Explaining an image with missing or non-existent objects is known as object bias (hallucination) in image captioning.
no code implementations • 6 Oct 2021 • Ali Furkan Biten, Andres Mafla, Lluis Gomez, Dimosthenis Karatzas
In this work, we propose two metrics that evaluate the degree of semantic relevance of retrieved items, independently of their annotated binary relevance.
1 code implementation • 25 Feb 2022 • Ali Furkan Biten, Rubèn Tito, Lluis Gomez, Ernest Valveny, Dimosthenis Karatzas
It is our hope that OCR-IDL can be a starting point for future works on Document Intelligence.
1 code implementation • 9 Mar 2022 • Mohamed Ali Souibgui, Sanket Biswas, Andres Mafla, Ali Furkan Biten, Alicia Fornés, Yousri Kessentini, Josep Lladós, Lluis Gomez, Dimosthenis Karatzas
In this paper, we propose a Text-Degradation Invariant Auto Encoder (Text-DIAE), a self-supervised model designed to tackle two tasks, text recognition (handwritten or scene-text) and document image enhancement.
no code implementations • 8 Apr 2022 • Adrià Molina, Lluis Gomez, Oriol Ramos Terrades, Josep Lladós
Date estimation of historical document images is a challenging problem, with several contributions in the literature that lack of the ability to generalize from one dataset to others.
no code implementations • 14 Sep 2022 • Emanuele Vivoli, Ali Furkan Biten, Andres Mafla, Dimosthenis Karatzas, Lluis Gomez
In this paper, we present a framework for Multilingual Scene Text Visual Question Answering that deals with new languages in a zero-shot fashion.
no code implementations • 21 Sep 2022 • Khanh Nguyen, Ali Furkan Biten, Andres Mafla, Lluis Gomez, Dimosthenis Karatzas
Particularly, a similar Wikimedia image can be used to illustrate different articles, and the produced caption needs to be adapted to a specific context, therefore allowing us to explore the limits of a model to adjust captions to different contextual information.