1 code implementation • ICCV 2023 • Cheng Da, Chuwei Luo, Qi Zheng, Cong Yao
Document pre-trained models and grid-based models have proven to be very effective on various tasks in Document AI.
Ranked #1 on Document Layout Analysis on PubLayNet val
1 code implementation • ICCV 2023 • Changxu Cheng, Peng Wang, Cheng Da, Qi Zheng, Cong Yao
The diversity in length constitutes a significant characteristic of text.
1 code implementation • 25 Jul 2023 • Cheng Da, Peng Wang, Cong Yao
Specifically, MGP-STR achieves an average recognition accuracy of $94\%$ on standard benchmarks for scene text recognition.
2 code implementations • 8 Sep 2022 • Cheng Da, Peng Wang, Cong Yao
A novel scene text recognizer based on Vision-Language Transformer (VLT) is presented.
2 code implementations • 8 Sep 2022 • Peng Wang, Cheng Da, Cong Yao
In this work, we first draw inspiration from the recent progress in Vision Transformer (ViT) to construct a conceptually simple yet powerful vision STR model, which is built upon ViT and outperforms previous state-of-the-art models for scene text recognition, including both pure vision models and language-augmented methods.
Ranked #1 on Scene Text Recognition on Uber-Text (using extra training data)
no code implementations • 9 Feb 2021 • Yanhao Zhang, Qiang Wang, Pan Pan, Yun Zheng, Cheng Da, Siyang Sun, Yinghui Xu
Nowadays, live-stream and short video shopping in E-commerce have grown exponentially.
no code implementations • CVPR 2017 • Cheng Da, Shibiao Xu, Kun Ding, Gaofeng Meng, Shiming Xiang, Chunhong Pan
(2) A multi-integer-embedding is employed for compressing the whole database, which is modeled by binary sparse representation with fixed sparsity.