no code implementations • Findings (ACL) 2022 • Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei
Multimodal pre-training with text, layout, and image has achieved SOTA performance for visually rich document understanding tasks recently, which demonstrates the great potential for joint learning across different modalities.
no code implementations • 20 May 2025 • Lingjie Jiang, Xun Wu, Shaohan Huang, Qingxiu Dong, Zewen Chi, Li Dong, Xingxing Zhang, Tengchao Lv, Lei Cui, Furu Wei
Furthermore, we introduce a metric called Hybrid Accuracy to quantitatively assess the model's capability for hybrid thinking.
no code implementations • 27 Mar 2025 • Jingye Chen, Yuzhong Zhao, Yupan Huang, Lei Cui, Li Dong, Tengchao Lv, Qifeng Chen, Furu Wei
Recent advances in generative models have significantly impacted game generation.
no code implementations • CVPR 2025 • Yangyu Huang, Tianyi Gao, Haoran Xu, QiHao Zhao, Yang song, Zhipeng Gui, Tengchao Lv, Hao Chen, Lei Cui, Scarlett Li, Furu Wei
Geologic map, as a fundamental diagram in geology science, provides critical insights into the structure and composition of Earth's subsurface and surface.
1 code implementation • 19 Dec 2024 • QiHao Zhao, Yangyu Huang, Tengchao Lv, Lei Cui, Qinzheng Sun, Shaoguang Mao, Xin Zhang, Ying Xin, Qiufeng Yin, Scarlett Li, Furu Wei
This benchmark reassesses LLMs' understanding of world knowledge by averting both unintentional and malicious data leakage.
no code implementations • 4 Dec 2024 • Yaoyao Chang, Lei Cui, Li Dong, Shaohan Huang, Yangyu Huang, Yupan Huang, Scarlett Li, Tengchao Lv, Shuming Ma, Qinzheng Sun, Wenhui Wang, Furu Wei, Ying Xin, Mao Yang, Qiufeng Yin, Xingxing Zhang
This study explores the untapped potential of Common Crawl as a comprehensive and flexible resource for pre-training LLMs, addressing both general-purpose language understanding and specialized domain knowledge.
no code implementations • 28 Nov 2023 • Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, Furu Wei
The diffusion model has been proven a powerful generative model in recent years, yet remains a challenge in generating visual text.
Ranked #6 on
Image Generation
on TextAtlasEval
no code implementations • 20 Sep 2023 • Tengchao Lv, Yupan Huang, Jingye Chen, Yuzhong Zhao, Yilin Jia, Lei Cui, Shuming Ma, Yaoyao Chang, Shaohan Huang, Wenhui Wang, Li Dong, Weiyao Luo, Shaoxiang Wu, Guoxin Wang, Cha Zhang, Furu Wei
In this paper we present KOSMOS-2. 5, a multimodal literate model for machine reading of text-intensive images.
no code implementations • NeurIPS 2023 • Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, Furu Wei
Diffusion models have gained increasing attention for their impressive generation abilities but currently struggle with rendering accurate and coherent text.
1 code implementation • NeurIPS 2023 • Shaohan Huang, Li Dong, Wenhui Wang, Yaru Hao, Saksham Singhal, Shuming Ma, Tengchao Lv, Lei Cui, Owais Khan Mohammed, Barun Patra, Qiang Liu, Kriti Aggarwal, Zewen Chi, Johan Bjorck, Vishrav Chaudhary, Subhojit Som, Xia Song, Furu Wei
A big convergence of language, multimodal perception, action, and world modeling is a key step toward artificial general intelligence.
1 code implementation • 6 Oct 2022 • Jingye Chen, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei
The surge of pre-training has witnessed the rapid development of document understanding recently.
Ranked #8 on
Semantic entity labeling
on FUNSD
3 code implementations • 18 Apr 2022 • Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei
In this paper, we propose \textbf{LayoutLMv3} to pre-train multimodal Transformers for Document AI with unified text and image masking.
Ranked #1 on
Key Information Extraction
on EPHOIE
4 code implementations • 4 Mar 2022 • Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei
We leverage DiT as the backbone network in a variety of vision-based Document AI tasks, including document image classification, document layout analysis, table detection as well as text detection for OCR.
Ranked #1 on
Table Detection
on ICDAR 2019
no code implementations • 16 Nov 2021 • Lei Cui, Yiheng Xu, Tengchao Lv, Furu Wei
Document AI, or Document Intelligence, is a relatively new research topic that refers to the techniques for automatically reading, understanding, and analyzing business documents.
8 code implementations • 21 Sep 2021 • Minghao Li, Tengchao Lv, Jingye Chen, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei
Text recognition is a long-standing research problem for document digitalization.
Ranked #1 on
Handwritten Text Recognition
on IAM(line-level)
(using extra training data)
1 code implementation • 10 Jun 2021 • Tengchao Lv, Lei Cui, Momcilo Vasilijevic, Furu Wei
Video transcript summarization is a fundamental task for video understanding.
6 code implementations • 18 Apr 2021 • Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei
In this paper, we present LayoutXLM, a multimodal pre-trained model for multilingual document understanding, which aims to bridge the language barriers for visually-rich document understanding.
Ranked #6 on
Key-value Pair Extraction
on SIBR
8 code implementations • ACL 2021 • Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou
Pre-training of text and layout has proved effective in a variety of visually-rich document understanding tasks due to its effective model architecture and the advantage of large-scale unlabeled scanned/digital-born documents.
Ranked #1 on
Key Information Extraction
on SROIE
no code implementations • IJCNLP 2019 • Shengli Sun, Qingfeng Sun, Kevin Zhou, Tengchao Lv
Most of the current effective methods for text classification tasks are based on large-scale labeled data and a great number of parameters, but when the supervised training data are few and difficult to be collected, these models are not available.