Search Results for author: Xiameng Qin

Found 8 papers, 4 papers with code

Collaborative Position Reasoning Network for Referring Image Segmentation

no code implementations22 Jan 2024 JianJian Cao, Beiya Dai, Yulin Li, Xiameng Qin, Jingdong Wang

Holi integrates features of the two modalities by a cross-modal attention mechanism, which suppresses the irrelevant redundancy under the guide of positioning information from RoCo.

Image Segmentation Position +2

MataDoc: Margin and Text Aware Document Dewarping for Arbitrary Boundary

no code implementations24 Jul 2023 Beiya Dai, Xing Li, Qunyi Xie, Yulin Li, Xiameng Qin, Chengquan Zhang, Kun Yao, Junyu Han

To produce a comprehensive evaluation of MataDoc, we propose a novel benchmark ArbDoc, mainly consisting of document images with arbitrary boundaries in four typical scenarios.

document understanding Optical Character Recognition (OCR)

TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision

no code implementations6 Jun 2023 Yukun Zhai, Xiaoqiang Zhang, Xiameng Qin, Sanyuan Zhao, Xingping Dong, Jianbing Shen

End-to-end text spotting is a vital computer vision task that aims to integrate scene text detection and recognition into a unified framework.

Scene Text Detection Text Detection +1

Fast-StrucTexT: An Efficient Hourglass Transformer with Modality-guided Dynamic Token Merge for Document Understanding

no code implementations19 May 2023 Mingliang Zhai, Yulin Li, Xiameng Qin, Chen Yi, Qunyi Xie, Chengquan Zhang, Kun Yao, Yuwei Wu, Yunde Jia

Transformers achieve promising performance in document understanding because of their high effectiveness and still suffer from quadratic computational complexity dependency on the sequence length.

document understanding

StrucTexTv2: Masked Visual-Textual Prediction for Document Image Pre-training

1 code implementation1 Mar 2023 Yuechen Yu, Yulin Li, Chengquan Zhang, Xiaoqiang Zhang, Zengyuan Guo, Xiameng Qin, Kun Yao, Junyu Han, Errui Ding, Jingdong Wang

Compared to the masked multi-modal modeling methods for document image understanding that rely on both the image and text modalities, StrucTexTv2 models image-only input and potentially deals with more application scenarios free from OCR pre-processing.

Document Image Classification Language Modelling +3

EATEN: Entity-aware Attention for Single Shot Visual Text Extraction

1 code implementation20 Sep 2019 He guo, Xiameng Qin, Jiaming Liu, Junyu Han, Jingtuo Liu, Errui Ding

Extracting entity from images is a crucial part of many OCR applications, such as entity recognition of cards, invoices, and receipts.

Entity Extraction using GAN Optical Character Recognition (OCR)

Cannot find the paper you are looking for? You can Submit a new open access paper.