Search Results for author: Andres Mafla

Found 8 papers, 4 papers with code

Multi-Modal Reasoning Graph for Scene-Text Based Fine-Grained Image Classification and Retrieval

1 code implementation21 Sep 2020 Andres Mafla, Sounak Dey, Ali Furkan Biten, Lluis Gomez, Dimosthenis Karatzas

Scene text instances found in natural images carry explicit semantic information that can provide important cues to solve a wide array of computer vision problems.

Fine-Grained Image Classification General Classification +2

Text-DIAE: A Self-Supervised Degradation Invariant Autoencoders for Text Recognition and Document Enhancement

1 code implementation9 Mar 2022 Mohamed Ali Souibgui, Sanket Biswas, Andres Mafla, Ali Furkan Biten, Alicia Fornés, Yousri Kessentini, Josep Lladós, Lluis Gomez, Dimosthenis Karatzas

In this paper, we propose a Text-Degradation Invariant Auto Encoder (Text-DIAE), a self-supervised model designed to tackle two tasks, text recognition (handwritten or scene-text) and document image enhancement.

Document Enhancement Scene Text Recognition

ICDAR 2019 Competition on Scene Text Visual Question Answering

no code implementations30 Jun 2019 Ali Furkan Biten, Rubèn Tito, Andres Mafla, Lluis Gomez, Marçal Rusiñol, Minesh Mathew, C. V. Jawahar, Ernest Valveny, Dimosthenis Karatzas

ST-VQA introduces an important aspect that is not addressed by any Visual Question Answering system up to date, namely the incorporation of scene text to answer questions asked about an image.

Question Answering Visual Question Answering

Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching

no code implementations6 Oct 2021 Ali Furkan Biten, Andres Mafla, Lluis Gomez, Dimosthenis Karatzas

In this work, we propose two metrics that evaluate the degree of semantic relevance of retrieved items, independently of their annotated binary relevance.

Image Captioning Image-text matching +2

MUST-VQA: MUltilingual Scene-text VQA

no code implementations14 Sep 2022 Emanuele Vivoli, Ali Furkan Biten, Andres Mafla, Dimosthenis Karatzas, Lluis Gomez

In this paper, we present a framework for Multilingual Scene Text Visual Question Answering that deals with new languages in a zero-shot fashion.

Question Answering Visual Question Answering

Show, Interpret and Tell: Entity-aware Contextualised Image Captioning in Wikipedia

no code implementations21 Sep 2022 Khanh Nguyen, Ali Furkan Biten, Andres Mafla, Lluis Gomez, Dimosthenis Karatzas

Particularly, a similar Wikimedia image can be used to illustrate different articles, and the produced caption needs to be adapted to a specific context, therefore allowing us to explore the limits of a model to adjust captions to different contextual information.

Image Captioning

Cannot find the paper you are looking for? You can Submit a new open access paper.