Domain generalization (DG) aims to learn a generalized model to an unseen target domain using only limited source domains.
Ranked #1 on Domain Generalization on TerraIncognita
Furthermore, MATRN stimulates combining semantic features into visual features by hiding visual clues related to the character in the training phase.
Ranked #2 on Scene Text Recognition on SVTP
On the other hand, this paper tackles the problem by going back to the basic: effective combination of text and layout.
Scene text editing (STE), which converts a text in a scene image into the desired text while preserving an original style, is a challenging task due to a complex intervention between text and style.
For successful scene text recognition (STR) models, synthetic text image generators have alleviated the lack of annotated text images from the real world.
Domain generalization (DG) methods aim to achieve generalizability to an unseen target domain by using only training data from the source domains.
Ranked #10 on Domain Generalization on VLCS
Knowledge distillation extracts general knowledge from a pre-trained teacher network and provides guidance to a target student network.
Ranked #19 on Knowledge Distillation on ImageNet
Although the recent advance in OCR enables the accurate extraction of text segments, it is still challenging to extract key information from documents due to the diversity of layouts.
This paper introduces a method that efficiently reduces the computational cost and parameter size of Transformer.
This architecture is formed by utilizing detection outputs in the recognizer and propagating the recognition loss through the detection stage.
We believe that our metrics can play a key role in developing and analyzing state-of-the-art text detection and recognition methods.
Scene text recognition (STR) is the task of recognizing character sequences in natural scenes.
Ranked #2 on Scene Text Recognition on ICDAR 2003
When compared to Transformers with a comparable number of parameters and time complexity, the proposed model shows better performance.
Parsing textual information embedded in images is important for various down- stream tasks.
The analyses on the user history require the robust sequential model to anticipate the transitions and the decays of user interests.
Successful application processing sequential data, such as text and speech, requires an improved generalization performance of recurrent neural networks (RNNs).
Many new proposals for scene text recognition (STR) models have been introduced in recent years.
Ranked #6 on Scene Text Recognition on ICDAR 2003
The experimental results show that 1) DirVAE models the latent representation result with the best log-likelihood compared to the baselines; and 2) DirVAE produces more interpretable latent values with no collapsing issues which the baseline models suffer from.
Recently, the training with adversarial examples, which are generated by adding a small but worst-case perturbation on input examples, has been proved to improve generalization performance of neural networks.