Search Results for author: Marçal Rusiñol

Found 16 papers, 6 papers with code

VLCDoC: Vision-Language Contrastive Pre-Training Model for Cross-Modal Document Classification

no code implementations24 May 2022 Souhail Bakkali, Zuheng Ming, Mickael Coustaty, Marçal Rusiñol, Oriol Ramos Terrades

Multimodal learning from document data has achieved great success lately as it allows to pre-train semantically meaningful features as a prior into a learnable downstream approach.

Document Classification

Content and Style Aware Generation of Text-line Images for Handwriting Recognition

no code implementations12 Apr 2022 Lei Kang, Pau Riba, Marçal Rusiñol, Alicia Fornés, Mauricio Villegas

Once properly trained, our method can also be adapted to new target data by only accessing unlabeled text-line images to mimic handwritten styles and produce images with any textual content.

Handwriting Recognition Handwritten Text Recognition

Multimodal grid features and cell pointers for Scene Text Visual Question Answering

no code implementations1 Jun 2020 Lluís Gómez, Ali Furkan Biten, Rubèn Tito, Andrés Mafla, Marçal Rusiñol, Ernest Valveny, Dimosthenis Karatzas

This paper presents a new model for the task of scene text visual question answering, in which questions about a given image can only be answered by reading and understanding scene text that is present in it.

Question Answering Visual Question Answering +1

Pay Attention to What You Read: Non-recurrent Handwritten Text-Line Recognition

no code implementations26 May 2020 Lei Kang, Pau Riba, Marçal Rusiñol, Alicia Fornés, Mauricio Villegas

Sequential architectures are a perfect fit to model text lines, not only because of the inherent temporal aspect of text, but also to learn probability distributions over sequences of characters and words.

Few-Shot Learning Handwriting Recognition

GANwriting: Content-Conditioned Generation of Styled Handwritten Word Images

3 code implementations ECCV 2020 Lei Kang, Pau Riba, Yaxing Wang, Marçal Rusiñol, Alicia Fornés, Mauricio Villegas

We propose a novel method that is able to produce credible handwritten word images by conditioning the generative process with both calligraphic style features and textual content.

Handwritten Word Generation

Candidate Fusion: Integrating Language Modelling into a Sequence-to-Sequence Handwritten Word Recognition Architecture

no code implementations21 Dec 2019 Lei Kang, Pau Riba, Mauricio Villegas, Alicia Fornés, Marçal Rusiñol

The main challenge faced when training a language model is to deal with the language model corpus which is usually different to the one used for training the handwritten word recognition system.

Language Modelling

Unsupervised Adaptation for Synthetic-to-Real Handwritten Word Recognition

no code implementations18 Sep 2019 Lei Kang, Marçal Rusiñol, Alicia Fornés, Pau Riba, Mauricio Villegas

Handwritten Text Recognition (HTR) is still a challenging problem because it must deal with two important difficulties: the variability among writing styles, and the scarcity of labelled data.

Data Augmentation Handwritten Text Recognition +1

ICDAR 2019 Competition on Scene Text Visual Question Answering

no code implementations30 Jun 2019 Ali Furkan Biten, Rubèn Tito, Andres Mafla, Lluis Gomez, Marçal Rusiñol, Minesh Mathew, C. V. Jawahar, Ernest Valveny, Dimosthenis Karatzas

ST-VQA introduces an important aspect that is not addressed by any Visual Question Answering system up to date, namely the incorporation of scene text to answer questions asked about an image.

Question Answering Visual Question Answering +1

Selective Style Transfer for Text

1 code implementation4 Jun 2019 Raul Gomez, Ali Furkan Biten, Lluis Gomez, Jaume Gibert, Marçal Rusiñol, Dimosthenis Karatzas

This paper explores the possibilities of image style transfer applied to text maintaining the original transcriptions.

Data Augmentation Scene Text Detection +1

Good News, Everyone! Context driven entity-aware captioning for news images

1 code implementation CVPR 2019 Ali Furkan Biten, Lluis Gomez, Marçal Rusiñol, Dimosthenis Karatzas

We propose a novel captioning method that is able to leverage contextual information provided by the text of news articles associated with an image.

Image Captioning

Self-Supervised Visual Representations for Cross-Modal Retrieval

no code implementations31 Jan 2019 Yash Patel, Lluis Gomez, Marçal Rusiñol, Dimosthenis Karatzas, C. V. Jawahar

Cross-modal retrieval methods have been significantly improved in last years with the use of deep neural networks and large-scale annotated datasets such as ImageNet and Places.

Cross-Modal Retrieval Image Classification +3

Single Shot Scene Text Retrieval

3 code implementations ECCV 2018 Lluís Gómez, Andrés Mafla, Marçal Rusiñol, Dimosthenis Karatzas

In this way, the text based image retrieval task can be casted as a simple nearest neighbor search of the query text representation over the outputs of the CNN over the entire image database.

Image Retrieval Retrieval +2

TextTopicNet - Self-Supervised Learning of Visual Features Through Embedding Images on Semantic Text Spaces

1 code implementation4 Jul 2018 Yash Patel, Lluis Gomez, Raul Gomez, Marçal Rusiñol, Dimosthenis Karatzas, C. V. Jawahar

We show that adequate visual features can be learned efficiently by training a CNN to predict the semantic textual context in which a particular image is more probable to appear as an illustration.

Image Classification object-detection +3

The Robust Reading Competition Annotation and Evaluation Platform

no code implementations18 Oct 2017 Dimosthenis Karatzas, Lluis Gómez, Anguelos Nicolaou, Marçal Rusiñol

The ICDAR Robust Reading Competition (RRC), initiated in 2003 and re-established in 2011, has become a de-facto evaluation standard for robust reading systems and algorithms.

Management

Self-supervised learning of visual features through embedding images into text topic spaces

no code implementations CVPR 2017 Lluis Gomez, Yash Patel, Marçal Rusiñol, Dimosthenis Karatzas, C. V. Jawahar

End-to-end training from scratch of current deep architectures for new computer vision problems would require Imagenet-scale datasets, and this is not always possible.

Image Classification object-detection +3

Cannot find the paper you are looking for? You can Submit a new open access paper.