WeCromCL: Weakly Supervised Cross-Modality Contrastive Learning for Transcription-only Supervised Text Spotting

no code implementations28 Jul 2024 Jingjing Wu, Zhengyao Fang, Pengyuan Lyu, Chengquan Zhang, Fanglin Chen, Guangming Lu, Wenjie Pei

In this work, we formulate this challenging problem as a Weakly Supervised Cross-modality Contrastive Learning problem, and design a simple yet effective model dubbed WeCromCL that is able to detect each transcription in a scene image in a weakly supervised manner.

Contrastive Learning Text Spotting

Towards Unified Multi-granularity Text Detection with Interactive Attention

no code implementations30 May 2024 Xingyu Wan, Chengquan Zhang, Pengyuan Lyu, Sen Fan, Zihan Ni, Kun Yao, Errui Ding, Jingdong Wang

Existing OCR engines or document image analysis systems typically rely on training separate models for text detection in varying scenarios and granularities, leading to significant computational complexity and resource demands.

Document Layout Analysis Optical Character Recognition (OCR) +3

GridFormer: Towards Accurate Table Structure Recognition via Grid Prediction

no code implementations26 Sep 2023 Pengyuan Lyu, Weihong Ma, Hongyi Wang, Yuechen Yu, Chengquan Zhang, Kun Yao, Yang Xue, Jingdong Wang

In this representation, the vertexes and edges of the grid store the localization and adjacency information of the table.

Towards Robust Real-Time Scene Text Detection: From Semantic to Instance Representation Learning

no code implementations14 Aug 2023 Xugong Qin, Pengyuan Lyu, Chengquan Zhang, Yu Zhou, Kun Yao, Peng Zhang, Hailun Lin, Weiping Wang

Different from existing methods which integrate multiple-granularity features or multiple outputs, we resort to the perspective of representation learning in which auxiliary tasks are utilized to enable the encoder to jointly learn robust features with the main task of per-pixel classification during optimization.

Representation Learning Scene Text Detection +1

Single Shot Self-Reliant Scene Text Spotter by Decoupled yet Collaborative Detection and Recognition

1 code implementation15 Jul 2022 Jingjing Wu, Pengyuan Lyu, Guangming Lu, Chengquan Zhang, Wenjie Pei

Typical text spotters follow the two-stage spotting paradigm which detects the boundary for a text instance first and then performs text recognition within the detected regions.

Text Detection Text Spotting

MaskOCR: Text Recognition with Masked Encoder-Decoder Pretraining

no code implementations1 Jun 2022 Pengyuan Lyu, Chengquan Zhang, Shanshan Liu, Meina Qiao, Yangliu Xu, Liang Wu, Kun Yao, Junyu Han, Errui Ding, Jingdong Wang

Specifically, we transform text data into synthesized text images to unify the data modalities of vision and language, and enhance the language modeling capability of the sequence decoder using a proposed masked image-language modeling scheme.

Decoder Language Modelling +2

PGNet: Real-time Arbitrarily-Shaped Text Spotting with Point Gathering Network

2 code implementations12 Apr 2021 Pengfei Wang, Chengquan Zhang, Fei Qi, Shanshan Liu, Xiaoqiang Zhang, Pengyuan Lyu, Junyu Han, Jingtuo Liu, Errui Ding, Guangming Shi

With a PG-CTC decoder, we gather high-level character classification vectors from two-dimensional space and decode them into text symbols without NMS and RoI operations involved, which guarantees high efficiency.

 Ranked #1 on Scene Text Detection on ICDAR 2015 (Accuracy metric)

Decoder Optical Character Recognition (OCR) +2

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

1 code implementation ECCV 2018 Minghui Liao, Pengyuan Lyu, Minghang He, Cong Yao, Wenhao Wu, Xiang Bai

Moreover, we further investigate the recognition module of our method separately, which significantly outperforms state-of-the-art methods on both regular and irregular text datasets for scene text recognition.

Scene Text Recognition Semantic Segmentation +2

2D Attentional Irregular Scene Text Recognizer

no code implementations13 Jun 2019 Pengyuan Lyu, Zhicheng Yang, Xinhang Leng, Xiao-Jun Wu, Ruiyu Li, Xiaoyong Shen

Irregular scene text, which has complex layout in 2D space, is challenging to most previous scene text recognizers.

Scene Text Recognition from Two-Dimensional Perspective

no code implementations18 Sep 2018 Minghui Liao, Jian Zhang, Zhaoyi Wan, Fengming Xie, Jiajun Liang, Pengyuan Lyu, Cong Yao, Xiang Bai

Inspired by speech recognition, recent state-of-the-art algorithms mostly consider scene text recognition as a sequence prediction problem.

Scene Text Recognition Semantic Segmentation +4

Auto-Encoder Guided GAN for Chinese Calligraphy Synthesis

no code implementations27 Jun 2017 Pengyuan Lyu, Xiang Bai, Cong Yao, Zhen Zhu, Tengteng Huang, Wenyu Liu

In this paper, we investigate the Chinese calligraphy synthesis problem: synthesizing Chinese calligraphy images with specified style from standard font(eg.

Image-to-Image Translation Translation

Integrating Scene Text and Visual Appearance for Fine-Grained Image Classification

no code implementations15 Apr 2017 Xiang Bai, Mingkun Yang, Pengyuan Lyu, Yongchao Xu, Jiebo Luo

Then, we combine the word embedding of the recognized words and the deep visual features into a single representation, which is optimized by a convolutional neural network for fine-grained image classification.

Classification Fine-Grained Image Classification +2

