Search Results for author: Marçal Rusiñol

Found 20 papers, 7 papers with code

TransferDoc: A Self-Supervised Transferable Document Representation Learning Model Unifying Vision and Language

no code implementations • 11 Sep 2023 • Souhail Bakkali, Sanket Biswas, Zuheng Ming, Mickael Coustaty, Marçal Rusiñol, Oriol Ramos Terrades, Josep Lladós

The field of visual document understanding has witnessed a rapid growth in emerging challenges and powerful multi-modal strategies.

Ranked #19 on Document Image Classification on RVL-CDIP

Document Image Classification document understanding +1

Paper
Add Code

STEP -- Towards Structured Scene-Text Spotting

1 code implementation • 5 Sep 2023 • Sergi Garcia-Bordils, Dimosthenis Karatzas, Marçal Rusiñol

We introduce the structured scene-text spotting task, which requires a scene-text OCR system to spot text in the wild according to a query regular expression.

Optical Character Recognition (OCR) Scene Text Detection +2

Paper
Code

EAML: Ensemble Self-Attention-based Mutual Learning Network for Document Image Classification

no code implementations • IJDAR 2021 • Souhail Bakkali, Ziheng Ming, Mickael Coustaty, Marçal Rusiñol

To the best of our knowledge, this is the first time to leverage a mutual learning approach along with a self-attention-based fusion module to perform document image classification.

Ranked #1 on Document Image Classification on RVL-CDIP

Document Image Classification

Paper
Add Code

VLCDoC: Vision-Language Contrastive Pre-Training Model for Cross-Modal Document Classification

no code implementations • 24 May 2022 • Souhail Bakkali, Zuheng Ming, Mickael Coustaty, Marçal Rusiñol, Oriol Ramos Terrades

Multimodal learning from document data has achieved great success lately as it allows to pre-train semantically meaningful features as a prior into a learnable downstream task.

Ranked #18 on Document Image Classification on RVL-CDIP

Document Classification Document Image Classification

Paper
Add Code

Content and Style Aware Generation of Text-line Images for Handwriting Recognition

no code implementations • 12 Apr 2022 • Lei Kang, Pau Riba, Marçal Rusiñol, Alicia Fornés, Mauricio Villegas

Once properly trained, our method can also be adapted to new target data by only accessing unlabeled text-line images to mimic handwritten styles and produce images with any textual content.

Handwriting Recognition Handwritten Text Recognition

Paper
Add Code

Visual and Textual Deep Feature Fusion for Document Image Classification

no code implementations • CVPRW 2020 • Souhail Bakkali, Ziheng Ming, Mickael Coustaty, Marçal Rusiñol

Moreover, a joint feature learning approach that combines image features and text embeddings is introduced as a late fusion methodology.

Ranked #2 on Document Image Classification on RVL-CDIP

Document Image Classification Retrieval +2

Paper
Add Code

Multimodal grid features and cell pointers for Scene Text Visual Question Answering

no code implementations • 1 Jun 2020 • Lluís Gómez, Ali Furkan Biten, Rubèn Tito, Andrés Mafla, Marçal Rusiñol, Ernest Valveny, Dimosthenis Karatzas

This paper presents a new model for the task of scene text visual question answering, in which questions about a given image can only be answered by reading and understanding scene text that is present in it.

Question Answering Visual Question Answering

Paper
Add Code

Pay Attention to What You Read: Non-recurrent Handwritten Text-Line Recognition

no code implementations • 26 May 2020 • Lei Kang, Pau Riba, Marçal Rusiñol, Alicia Fornés, Mauricio Villegas

Sequential architectures are a perfect fit to model text lines, not only because of the inherent temporal aspect of text, but also to learn probability distributions over sequences of characters and words.

Ranked #8 on Handwritten Text Recognition on IAM

Few-Shot Learning Handwriting Recognition +1

Paper
Add Code

GANwriting: Content-Conditioned Generation of Styled Handwritten Word Images

3 code implementations • ECCV 2020 • Lei Kang, Pau Riba, Yaxing Wang, Marçal Rusiñol, Alicia Fornés, Mauricio Villegas

We propose a novel method that is able to produce credible handwritten word images by conditioning the generative process with both calligraphic style features and textual content.

Handwritten Word Generation

Paper
Code

Candidate Fusion: Integrating Language Modelling into a Sequence-to-Sequence Handwritten Word Recognition Architecture

no code implementations • 21 Dec 2019 • Lei Kang, Pau Riba, Mauricio Villegas, Alicia Fornés, Marçal Rusiñol

The main challenge faced when training a language model is to deal with the language model corpus which is usually different to the one used for training the handwritten word recognition system.

Language Modelling

Paper
Add Code

Unsupervised Adaptation for Synthetic-to-Real Handwritten Word Recognition

no code implementations • 18 Sep 2019 • Lei Kang, Marçal Rusiñol, Alicia Fornés, Pau Riba, Mauricio Villegas

Handwritten Text Recognition (HTR) is still a challenging problem because it must deal with two important difficulties: the variability among writing styles, and the scarcity of labelled data.

Data Augmentation Handwritten Text Recognition +2

Paper
Add Code

ICDAR 2019 Competition on Scene Text Visual Question Answering

no code implementations • 30 Jun 2019 • Ali Furkan Biten, Rubèn Tito, Andres Mafla, Lluis Gomez, Marçal Rusiñol, Minesh Mathew, C. V. Jawahar, Ernest Valveny, Dimosthenis Karatzas

ST-VQA introduces an important aspect that is not addressed by any Visual Question Answering system up to date, namely the incorporation of scene text to answer questions asked about an image.

Question Answering Visual Question Answering

Paper
Add Code

Selective Style Transfer for Text

1 code implementation • 4 Jun 2019 • Raul Gomez, Ali Furkan Biten, Lluis Gomez, Jaume Gibert, Marçal Rusiñol, Dimosthenis Karatzas

This paper explores the possibilities of image style transfer applied to text maintaining the original transcriptions.

Data Augmentation Scene Text Detection +2

Paper
Code

Scene Text Visual Question Answering

3 code implementations • ICCV 2019 • Ali Furkan Biten, Ruben Tito, Andres Mafla, Lluis Gomez, Marçal Rusiñol, Ernest Valveny, C. V. Jawahar, Dimosthenis Karatzas

Current visual question answering datasets do not consider the rich semantic information conveyed by text within an image.

Question Answering Visual Question Answering

Paper
Code

Good News, Everyone! Context driven entity-aware captioning for news images

1 code implementation • CVPR 2019 • Ali Furkan Biten, Lluis Gomez, Marçal Rusiñol, Dimosthenis Karatzas

We propose a novel captioning method that is able to leverage contextual information provided by the text of news articles associated with an image.

Descriptive Image Captioning

125

Paper
Code

Self-Supervised Visual Representations for Cross-Modal Retrieval

no code implementations • 31 Jan 2019 • Yash Patel, Lluis Gomez, Marçal Rusiñol, Dimosthenis Karatzas, C. V. Jawahar

Cross-modal retrieval methods have been significantly improved in last years with the use of deep neural networks and large-scale annotated datasets such as ImageNet and Places.

Cross-Modal Retrieval Image Classification +3

Paper
Add Code

Single Shot Scene Text Retrieval

3 code implementations • ECCV 2018 • Lluís Gómez, Andrés Mafla, Marçal Rusiñol, Dimosthenis Karatzas

In this way, the text based image retrieval task can be casted as a simple nearest neighbor search of the query text representation over the outputs of the CNN over the entire image database.

Image Retrieval Retrieval +2

Paper
Code

TextTopicNet - Self-Supervised Learning of Visual Features Through Embedding Images on Semantic Text Spaces

1 code implementation • 4 Jul 2018 • Yash Patel, Lluis Gomez, Raul Gomez, Marçal Rusiñol, Dimosthenis Karatzas, C. V. Jawahar

We show that adequate visual features can be learned efficiently by training a CNN to predict the semantic textual context in which a particular image is more probable to appear as an illustration.

Image Classification object-detection +3

Paper
Code

The Robust Reading Competition Annotation and Evaluation Platform

no code implementations • 18 Oct 2017 • Dimosthenis Karatzas, Lluis Gómez, Anguelos Nicolaou, Marçal Rusiñol

The ICDAR Robust Reading Competition (RRC), initiated in 2003 and re-established in 2011, has become a de-facto evaluation standard for robust reading systems and algorithms.

Management

Paper
Add Code

Self-supervised learning of visual features through embedding images into text topic spaces

no code implementations • CVPR 2017 • Lluis Gomez, Yash Patel, Marçal Rusiñol, Dimosthenis Karatzas, C. V. Jawahar

End-to-end training from scratch of current deep architectures for new computer vision problems would require Imagenet-scale datasets, and this is not always possible.

Image Classification object-detection +3

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.