In this paper, we present a comprehensive survey trying to offer a systematic and structured investigation on robust adversarial training in pattern recognition.
In this paper, we propose a simple yet effective approach to rectify distorted document image by estimating control points and reference points.
Moreover, through experiments we show that discrete language representation has several advantages compared with continuous feature representation, from the aspects of interpretability, generalization, and robustness.
Deep learning systems typically suffer from catastrophic forgetting of past knowledge when acquiring new skills continually.
Comprehensive experiments demonstrate that FSR is effective to alleviate the dominance of larger eigenvalues and improve adversarial robustness on different datasets.
Despite the impressive performance in many individual tasks, deep neural networks suffer from catastrophic forgetting when learning new tasks incrementally.
To overcome the lack of character-level annotations, we propose a novel weakly-supervised character center detection module, which only uses word-level annotated real images to generate character-level labels.
As camera-based documents are increasingly used, the rectification of distorted document images becomes a need to improve the recognition performance.
In spite of the simplicity, extensive experiments demonstrate that the misclassification detection performance of DNNs can be significantly improved by seeing more generated pseudo-classes during training.
We propose an Expectation-Maximization (EM) based weakly-supervised learning framework to train an accurate arbitrary-shaped text detector using only a small amount of polygon-level annotated data combined with a large amount of weakly annotated data.
Taking scene text detection as the application, where no suitable ensemble learning strategy exists, PEL can significantly improve the performance, compared to either individual state-of-the-art models, or the fusion of multiple models by non-maximum suppression.
Scene text recognition has drawn great attentions in the community of computer vision and artificial intelligence due to its challenges and wide applications.
To improve the robustness, we propose a novel learning framework called convolutional prototype learning (CPL).
Conjugate gradient (CG) methods are a class of important methods for solving linear equations and nonlinear optimization problems.
In this paper, we investigate the intrinsic characteristics of text recognition, and inspired by human cognition mechanisms in reading texts, we propose a scene text recognition method with character models on convolutional feature map.
To verify this point of view, we propose a deep direct regression based method for multi-oriented scene text detection.
In this paper, we propose a framework by using the recurrent neural network (RNN) as both a discriminative model for recognizing Chinese characters and a generative model for drawing (generating) Chinese characters.
Furthermore, although directMap+convNet can achieve the best results and surpass human-level performance, we show that writer adaptation in this case is still effective.
Learned from a large-scale training dataset, CNN features are much more discriminative and accurate than the hand-crafted features.