Dictionary-Guided Scene Text Recognition

CVPR 2021 · Nguyen Nguyen, Thu Nguyen, Vinh Tran, Minh-Triet Tran, Thanh Duc Ngo, Thien Huu Nguyen, Minh Hoai ·

Language prior plays an important role in the way humans perceive and recognize text in the wild. In this work, we present an approach to train and use scene text recognition models by exploiting multiple clues from a language reference. Current scene text recognition methods have used lexicons to improve recognition performance, but their naive approach of simply casting the output into a dictionary word based purely on the edit distance has many limitations. We introduce here a novel approach to incorporate a dictionary in both the training and inference stage of a scene text recognition system. We use the dictionary to generate a list of possible outcomes and find the one that is most compatible with the visual appearance of the text. The proposed method leads to a robust scene text recognition model, which is better at handling ambiguous cases encountered in the wild, and improves the overall performance of a state-of-the-art scene text spotting framework. Our work suggests that incorporating language prior is a potential approach to advance scene text detection and recognition methods. Besides, we contribute a challenging scene text dataset for Vietnamese, where some characters are equivocal in the visual form due to accent symbols. This dataset will serve as a challenging benchmark for measuring the applicability and robustness of scene text detection and recognition algorithms.

PDF Abstract