no code implementations • 10 Apr 2024 • Chaohu Liu, Kun Yin, Haoyu Cao, Xinghua Jiang, Xin Li, Yinsong Liu, Deqiang Jiang, Xing Sun, Linli Xu
In addition, we construct a document-oriented visual instruction tuning dataset and apply a multi-stage training strategy to enhance the model's document modeling capabilities.
no code implementations • 29 Feb 2024 • Xin Li, Yunfei Wu, Xinghua Jiang, Zhihao Guo, Mingming Gong, Haoyu Cao, Yinsong Liu, Deqiang Jiang, Xing Sun
It can represent that the contrastive learning between the visual holistic representations and the multimodal fine-grained features of document objects can assist the vision encoder in acquiring more effective visual cues, thereby enhancing the comprehension of text-rich documents in LVLMs.
no code implementations • ICCV 2023 • Haoyu Cao, Changcun Bao, Chaohu Liu, Huang Chen, Kun Yin, Hao liu, Yinsong Liu, Deqiang Jiang, Xing Sun
We propose a novel end-to-end document understanding model called SeRum (SElective Region Understanding Model) for extracting meaningful information from document images, including document analysis, retrieval, and office automation.
1 code implementation • 21 Aug 2023 • Wenwen Yu, Yuliang Liu, Xingkui Zhu, Haoyu Cao, Xing Sun, Xiang Bai
Utilizing only 10% of the supervised data, FastTCM-CR50 improves performance by an average of 26. 5% and 5. 5% for text detection and spotting tasks, respectively.
no code implementations • 5 Jun 2023 • Wenwen Yu, Chengquan Zhang, Haoyu Cao, Wei Hua, Bohan Li, Huang Chen, MingYu Liu, Mingrui Chen, Jianfeng Kuang, Mengjun Cheng, Yuning Du, Shikun Feng, Xiaoguang Hu, Pengyuan Lyu, Kun Yao, Yuechen Yu, Yuliang Liu, Wanxiang Che, Errui Ding, Cheng-Lin Liu, Jiebo Luo, Shuicheng Yan, Min Zhang, Dimosthenis Karatzas, Xing Sun, Jingdong Wang, Xiang Bai
It is hoped that this competition will attract many researchers in the field of CV and NLP, and bring some new thoughts to the field of Document AI.
no code implementations • NAACL 2022 • Haoyu Cao, Jiefeng Ma, Antai Guo, Yiqing Hu, Hao liu, Deqiang Jiang, Yinsong Liu, Bo Ren
Document Information Extraction (DIE) has attracted increasing attention due to its various advanced applications in the real world.
no code implementations • 5 May 2022 • Xin Li, Yan Zheng, Yiqing Hu, Haoyu Cao, Yunfei Wu, Deqiang Jiang, Yinsong Liu, Bo Ren
To deal with the unpredictable definition of relations, we propose a novel contrastive learning task named Relational Consistency Modeling (RCM), which harnesses the fact that existing relations should be consistent in differently augmented positive views.