Search Results for author: Lluis Gomez

Found 32 papers, 17 papers with code

Show, Interpret and Tell: Entity-aware Contextualised Image Captioning in Wikipedia

no code implementations21 Sep 2022 Khanh Nguyen, Ali Furkan Biten, Andres Mafla, Lluis Gomez, Dimosthenis Karatzas

Particularly, a similar Wikimedia image can be used to illustrate different articles, and the produced caption needs to be adapted to a specific context, therefore allowing us to explore the limits of a model to adjust captions to different contextual information.

Image Captioning

MUST-VQA: MUltilingual Scene-text VQA

no code implementations14 Sep 2022 Emanuele Vivoli, Ali Furkan Biten, Andres Mafla, Dimosthenis Karatzas, Lluis Gomez

In this paper, we present a framework for Multilingual Scene Text Visual Question Answering that deals with new languages in a zero-shot fashion.

Question Answering Visual Question Answering +1

A Generic Image Retrieval Method for Date Estimation of Historical Document Collections

no code implementations8 Apr 2022 Adrià Molina, Lluis Gomez, Oriol Ramos Terrades, Josep Lladós

Date estimation of historical document images is a challenging problem, with several contributions in the literature that lack of the ability to generalize from one dataset to others.

Image Retrieval

Text-DIAE: A Self-Supervised Degradation Invariant Autoencoders for Text Recognition and Document Enhancement

no code implementations9 Mar 2022 Mohamed Ali Souibgui, Sanket Biswas, Andres Mafla, Ali Furkan Biten, Alicia Fornés, Yousri Kessentini, Josep Lladós, Lluis Gomez, Dimosthenis Karatzas

In this paper, we propose a Text-Degradation Invariant Auto Encoder (Text-DIAE), a self-supervised model designed to tackle two tasks, text recognition (handwritten or scene-text) and document image enhancement.

Image Enhancement Scene Text Recognition

Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching

no code implementations6 Oct 2021 Ali Furkan Biten, Andres Mafla, Lluis Gomez, Dimosthenis Karatzas

In this work, we propose two metrics that evaluate the degree of semantic relevance of retrieved items, independently of their annotated binary relevance.

Image Captioning Text Matching

Let there be a clock on the beach: Reducing Object Hallucination in Image Captioning

1 code implementation4 Oct 2021 Ali Furkan Biten, Lluis Gomez, Dimosthenis Karatzas

Explaining an image with missing or non-existent objects is known as object bias (hallucination) in image captioning.

Image Captioning

Learning to Rank Words: Optimizing Ranking Metrics for Word Spotting

1 code implementation9 Jun 2021 Pau Riba, Adrià Molina, Lluis Gomez, Oriol Ramos-Terrades, Josep Lladós

In this paper, we explore and evaluate the use of ranking-based objective functions for learning simultaneously a word string and a word image encoder.

Learning-To-Rank

One-shot Compositional Data Generation for Low Resource Handwritten Text Recognition

no code implementations11 May 2021 Mohamed Ali Souibgui, Ali Furkan Biten, Sounak Dey, Alicia Fornés, Yousri Kessentini, Lluis Gomez, Dimosthenis Karatzas, Josep Lladós

Low resource Handwritten Text Recognition (HTR) is a hard problem due to the scarce annotated data and the very limited linguistic information (dictionaries and language models).

Handwritten Text Recognition

Multi-Modal Reasoning Graph for Scene-Text Based Fine-Grained Image Classification and Retrieval

1 code implementation21 Sep 2020 Andres Mafla, Sounak Dey, Ali Furkan Biten, Lluis Gomez, Dimosthenis Karatzas

Scene text instances found in natural images carry explicit semantic information that can provide important cues to solve a wide array of computer vision problems.

Fine-Grained Image Classification General Classification +1

Text Recognition -- Real World Data and Where to Find Them

no code implementations6 Jul 2020 Klára Janoušková, Jiri Matas, Lluis Gomez, Dimosthenis Karatzas

We present a method for exploiting weakly annotated images to improve text extraction pipelines.

RoadText-1K: Text Detection & Recognition Dataset for Driving Videos

no code implementations19 May 2020 Sangeeth Reddy, Minesh Mathew, Lluis Gomez, Marcal Rusinol, Dimosthenis Karatzas., C. V. Jawahar

State of the art methods for text detection, recognition and tracking are evaluated on the new dataset and the results signify the challenges in unconstrained driving videos compared to existing datasets.

Exploring Hate Speech Detection in Multimodal Publications

1 code implementation9 Oct 2019 Raul Gomez, Jaume Gibert, Lluis Gomez, Dimosthenis Karatzas

In this work we target the problem of hate speech detection in multimodal publications formed by a text and an image.

Hate Speech Detection

ICDAR 2019 Competition on Scene Text Visual Question Answering

no code implementations30 Jun 2019 Ali Furkan Biten, Rubèn Tito, Andres Mafla, Lluis Gomez, Marçal Rusiñol, Minesh Mathew, C. V. Jawahar, Ernest Valveny, Dimosthenis Karatzas

ST-VQA introduces an important aspect that is not addressed by any Visual Question Answering system up to date, namely the incorporation of scene text to answer questions asked about an image.

Question Answering Visual Question Answering +1

Selective Style Transfer for Text

1 code implementation4 Jun 2019 Raul Gomez, Ali Furkan Biten, Lluis Gomez, Jaume Gibert, Marçal Rusiñol, Dimosthenis Karatzas

This paper explores the possibilities of image style transfer applied to text maintaining the original transcriptions.

Data Augmentation Scene Text Detection +1

Good News, Everyone! Context driven entity-aware captioning for news images

1 code implementation CVPR 2019 Ali Furkan Biten, Lluis Gomez, Marçal Rusiñol, Dimosthenis Karatzas

We propose a novel captioning method that is able to leverage contextual information provided by the text of news articles associated with an image.

Image Captioning

Self-Supervised Visual Representations for Cross-Modal Retrieval

no code implementations31 Jan 2019 Yash Patel, Lluis Gomez, Marçal Rusiñol, Dimosthenis Karatzas, C. V. Jawahar

Cross-modal retrieval methods have been significantly improved in last years with the use of deep neural networks and large-scale annotated datasets such as ImageNet and Places.

Cross-Modal Retrieval Image Classification +2

Self-Supervised Learning from Web Data for Multimodal Retrieval

1 code implementation7 Jan 2019 Raul Gomez, Lluis Gomez, Jaume Gibert, Dimosthenis Karatzas

In this work we propose to exploit this free available data to learn a multimodal image and text embedding, aiming to leverage the semantic knowledge learnt in the text domain and transfer it to a visual model for semantic image retrieval.

Image Retrieval Self-Supervised Learning

Learning to Learn from Web Data through Deep Semantic Embeddings

1 code implementation20 Aug 2018 Raul Gomez, Lluis Gomez, Jaume Gibert, Dimosthenis Karatzas

In this paper we propose to learn a multimodal image and text embedding from Web and Social Media data, aiming to leverage the semantic knowledge learnt in the text domain and transfer it to a visual model for semantic image retrieval.

Image Retrieval

Learning from #Barcelona Instagram data what Locals and Tourists post about its Neighbourhoods

1 code implementation20 Aug 2018 Raul Gomez, Lluis Gomez, Jaume Gibert, Dimosthenis Karatzas

We perform a language separate treatment of the data and show that it can be extrapolated to a tourists and locals separate analysis, and that tourism is reflected in Social Media at a neighborhood level.

TextTopicNet - Self-Supervised Learning of Visual Features Through Embedding Images on Semantic Text Spaces

1 code implementation4 Jul 2018 Yash Patel, Lluis Gomez, Raul Gomez, Marçal Rusiñol, Dimosthenis Karatzas, C. V. Jawahar

We show that adequate visual features can be learned efficiently by training a CNN to predict the semantic textual context in which a particular image is more probable to appear as an illustration.

Image Classification object-detection +2

Self-supervised learning of visual features through embedding images into text topic spaces

no code implementations CVPR 2017 Lluis Gomez, Yash Patel, Marçal Rusiñol, Dimosthenis Karatzas, C. V. Jawahar

End-to-end training from scratch of current deep architectures for new computer vision problems would require Imagenet-scale datasets, and this is not always possible.

Image Classification object-detection +2

Improving Text Proposals for Scene Images with Fully Convolutional Networks

1 code implementation16 Feb 2017 Dena Bazazian, Raul Gomez, Anguelos Nicolaou, Lluis Gomez, Dimosthenis Karatzas, Andrew D. Bagdanov

Text Proposals have emerged as a class-dependent version of object proposals - efficient approaches to reduce the search space of possible text object locations in an image.

Scene Text Recognition

A fine-grained approach to scene text script identification

no code implementations24 Feb 2016 Lluis Gomez, Dimosthenis Karatzas

Although widely studied for document images and handwritten documents, it remains an almost unexplored territory for scene text images.

Scene Text Recognition

Improving patch-based scene text script identification with ensembles of conjoined networks

1 code implementation24 Feb 2016 Lluis Gomez, Anguelos Nicolaou, Dimosthenis Karatzas

Instead of resizing input images to a fixed aspect ratio as in the typical use of holistic CNN classifiers, we propose here a patch-based classification framework in order to preserve discriminative parts of the image that are characteristic of its class.

General Classification Optical Character Recognition

Object Proposals for Text Extraction in the Wild

1 code implementation8 Sep 2015 Lluis Gomez, Dimosthenis Karatzas

The use of Object Proposals techniques in the scene text understanding field is innovative.

A Fast Hierarchical Method for Multi-script and Arbitrary Oriented Scene Text Extraction

no code implementations28 Jul 2014 Lluis Gomez, Dimosthenis Karatzas

Typography and layout lead to the hierarchical organisation of text in words, text lines, paragraphs.

Text Segmentation

Cannot find the paper you are looking for? You can Submit a new open access paper.