no code implementations • 22 Jan 2024 • JianJian Cao, Beiya Dai, Yulin Li, Xiameng Qin, Jingdong Wang
Holi integrates features of the two modalities by a cross-modal attention mechanism, which suppresses the irrelevant redundancy under the guide of positioning information from RoCo.
no code implementations • 24 Jul 2023 • Beiya Dai, Xing Li, Qunyi Xie, Yulin Li, Xiameng Qin, Chengquan Zhang, Kun Yao, Junyu Han
To produce a comprehensive evaluation of MataDoc, we propose a novel benchmark ArbDoc, mainly consisting of document images with arbitrary boundaries in four typical scenarios.
no code implementations • 6 Jun 2023 • Yukun Zhai, Xiaoqiang Zhang, Xiameng Qin, Sanyuan Zhao, Xingping Dong, Jianbing Shen
End-to-end text spotting is a vital computer vision task that aims to integrate scene text detection and recognition into a unified framework.
no code implementations • 19 May 2023 • Mingliang Zhai, Yulin Li, Xiameng Qin, Chen Yi, Qunyi Xie, Chengquan Zhang, Kun Yao, Yuwei Wu, Yunde Jia
Transformers achieve promising performance in document understanding because of their high effectiveness and still suffer from quadratic computational complexity dependency on the sequence length.
1 code implementation • 1 Mar 2023 • Yuechen Yu, Yulin Li, Chengquan Zhang, Xiaoqiang Zhang, Zengyuan Guo, Xiameng Qin, Kun Yao, Junyu Han, Errui Ding, Jingdong Wang
Compared to the masked multi-modal modeling methods for document image understanding that rely on both the image and text modalities, StrucTexTv2 models image-only input and potentially deals with more application scenarios free from OCR pre-processing.
Ranked #1 on Table Recognition on WTW
1 code implementation • 14 Dec 2021 • JianJian Cao, Xiameng Qin, Sanyuan Zhao, Jianbing Shen
In this paper, we focus on these two problems and propose a Graph Matching Attention (GMA) network.
1 code implementation • 6 Aug 2021 • Yulin Li, Yuxi Qian, Yuchen Yu, Xiameng Qin, Chengquan Zhang, Yan Liu, Kun Yao, Junyu Han, Jingtuo Liu, Errui Ding
Due to the complexity of content and layout in VRDs, structured text understanding has been a challenging task.
1 code implementation • 20 Sep 2019 • He guo, Xiameng Qin, Jiaming Liu, Junyu Han, Jingtuo Liu, Errui Ding
Extracting entity from images is a crucial part of many OCR applications, such as entity recognition of cards, invoices, and receipts.
Entity Extraction using GAN Optical Character Recognition (OCR)