Moreover, our proposed text recognizer exceeds previous state-of-the-art text recognition methods by averagely 5. 3% on 11 benchmarks, with similar model size.
Different from previous approaches that learn robust deep representations of scene text in a holistic manner, our method performs scene text detection based on a few representative features, which avoids the disturbance by background and reduces the computational cost.
Ranked #12 on Object Detection In Aerial Images on DOTA
Under the guidance of the geometrical relationship between OCR tokens, our LSTM-R capitalizes on a newly-devised relation-aware pointer network to select OCR tokens from the scene text for OCR-based image captioning.
The technology for Visual Odometry (VO) that estimates the position and orientation of the moving object through analyzing the image sequences captured by on-board cameras, has been well investigated with the rising interest in autonomous driving.
Such a task is usually realized by matching a query text to the recognized words, outputted by an end-to-end scene text spotter.
In this work, inspired by the success of neural architecture search (NAS), which can identify better architectures than human-designed ones, we propose automated STR (AutoSTR) to search data-dependent backbones to boost text recognition performance.
no code implementations • 20 Dec 2019 • Xi Liu, Rui Zhang, Yongsheng Zhou, Qianyi Jiang, Qi Song, Nan Li, Kai Zhou, Lei Wang, Dong Wang, Minghui Liao, Mingkun Yang, Xiang Bai, Baoguang Shi, Dimosthenis Karatzas, Shijian Lu, C. V. Jawahar
21 teams submit results for Task 1, 23 teams submit results for Task 2, 24 teams submit results for Task 3, and 13 teams submit results for Task 4.
Recently, end-to-end text spotting that aims to detect and recognize text from cluttered images simultaneously has received particularly growing interest in computer vision.
Reading text in the wild is a very challenging task due to the diversity of text instances and the complexity of natural scenes.
SCENE text recognition has attracted great interest from the academia and the industry in recent years owing to its importance in a wide range of applications.
Recently, many methods of person re-identification (Re-ID) rely on part-based feature representation to learn a discriminative pedestrian descriptor.
This report introduces RCTW, a new competition that focuses on Chinese text reading.
Then, we combine the word embedding of the recognized words and the deep visual features into a single representation, which is optimized by a convolutional neural network for fine-grained image classification.