no code implementations • ACL 2022 • Junlong Li, Yiheng Xu, Lei Cui, Furu Wei
Multimodal pre-training with text, layout, and image has made significant progress for Visually Rich Document Understanding (VRDU), especially the fixed-layout documents such as scanned document images.
no code implementations • Findings (ACL) 2022 • Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei
Multimodal pre-training with text, layout, and image has achieved SOTA performance for visually rich document understanding tasks recently, which demonstrates the great potential for joint learning across different modalities.
1 code implementation • 27 Feb 2023 • Shaohan Huang, Li Dong, Wenhui Wang, Yaru Hao, Saksham Singhal, Shuming Ma, Tengchao Lv, Lei Cui, Owais Khan Mohammed, Barun Patra, Qiang Liu, Kriti Aggarwal, Zewen Chi, Johan Bjorck, Vishrav Chaudhary, Subhojit Som, Xia Song, Furu Wei
A big convergence of language, multimodal perception, action, and world modeling is a key step toward artificial general intelligence.
Ranked #2 on
Image Captioning
on Flickr30k Captions test
(CIDEr metric)
no code implementations • 7 Oct 2022 • Lei Cui, Yangguang Li, Xin Lu, Dong An, Fenggang Liu
Bayesian Optimization (BO) is a common solution to search optimal hyperparameters based on sample observations of a machine learning model.
1 code implementation • 6 Oct 2022 • Jingye Chen, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei
The surge of pre-training has witnessed the rapid development of document understanding recently.
Ranked #4 on
Semantic entity labeling
on FUNSD
no code implementations • 5 Sep 2022 • Zhiyuan You, Kai Yang, Wenhan Luo, Lei Cui, Yu Zheng, Xinyi Le
Second, CNN tends to reconstruct both normal samples and anomalies well, making them still hard to distinguish.
1 code implementation • 8 Jun 2022 • Zhiyuan You, Lei Cui, Yujun Shen, Kai Yang, Xin Lu, Yu Zheng, Xinyi Le
For example, when learning a unified model for 15 categories in MVTec-AD, we surpass the second competitor on the tasks of both anomaly detection (from 88. 1% to 96. 5%) and anomaly localization (from 89. 5% to 96. 8%).
no code implementations • 5 May 2022 • Yuxin Kang, Hansheng Li, Xuan Zhao, Dongqing Hu, Feihong Liu, Lei Cui, Jun Feng, Lin Yang
In this paper, we propose a method, named Invariant Content Synergistic Learning (ICSL), to improve the generalization ability of DCNNs on unseen datasets by controlling the inductive bias.
2 code implementations • 18 Apr 2022 • Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei
In this paper, we propose \textbf{LayoutLMv3} to pre-train multimodal Transformers for Document AI with unified text and image masking.
Ranked #1 on
Key information extraction
on EPHOIE
3 code implementations • 4 Mar 2022 • Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei
We leverage DiT as the backbone network in a variety of vision-based Document AI tasks, including document image classification, document layout analysis, table detection as well as text detection for OCR.
Ranked #1 on
Table Detection
on cTDaR
1 code implementation • 22 Jan 2022 • Zhiyuan You, Kai Yang, Wenhan Luo, Xin Lu, Lei Cui, Xinyi Le
This work studies the problem of few-shot object counting, which counts the number of exemplar objects (i. e., described by one or several support images) occurring in the query image.
Ranked #2 on
Object Counting
on CARPK
no code implementations • 16 Nov 2021 • Lei Cui, Yiheng Xu, Tengchao Lv, Furu Wei
Document AI, or Document Intelligence, is a relatively new research topic that refers to the techniques for automatically reading, understanding, and analyzing business documents.
2 code implementations • 16 Oct 2021 • Junlong Li, Yiheng Xu, Lei Cui, Furu Wei
Multimodal pre-training with text, layout, and image has made significant progress for Visually Rich Document Understanding (VRDU), especially the fixed-layout documents such as scanned document images.
2 code implementations • 21 Sep 2021 • Minghao Li, Tengchao Lv, Jingye Chen, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei
Text recognition is a long-standing research problem for document digitalization.
1 code implementation • EMNLP 2021 • Zilong Wang, Yiheng Xu, Lei Cui, Jingbo Shang, Furu Wei
Reading order detection is the cornerstone to understanding visually-rich documents (e. g., receipts and forms).
no code implementations • CVPR 2021 • Shenzhi Wang, Liwei Wu, Lei Cui, Yujun Shen
More concretely, we employ a Local-Net and Global-Net to extract features from any individual patch and its surrounding respectively.
1 code implementation • 10 Jun 2021 • Tengchao Lv, Lei Cui, Momcilo Vasilijevic, Furu Wei
Video transcript summarization is a fundamental task for video understanding.
no code implementations • 28 May 2021 • Ming Sun, Haoxuan Dou, Baopu Li, Lei Cui, Junjie Yan, Wanli Ouyang
Data sampling acts as a pivotal role in training deep learning models.
5 code implementations • 18 Apr 2021 • Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei
In this paper, we present LayoutXLM, a multimodal pre-trained model for multilingual document understanding, which aims to bridge the language barriers for visually-rich document understanding.
Ranked #10 on
Document Image Classification
on RVL-CDIP
5 code implementations • ACL 2021 • Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou
Pre-training of text and layout has proved effective in a variety of visually-rich document understanding tasks due to its effective model architecture and the advantage of large-scale unlabeled scanned/digital-born documents.
Ranked #1 on
Key information extraction
on SROIE
no code implementations • COLING 2020 • Shaohan Huang, Furu Wei, Lei Cui, Xingxing Zhang, Ming Zhou
Fine-tuning with pre-trained language models (e. g. BERT) has achieved great success in many language understanding tasks in supervised settings (e. g. text classification).
2 code implementations • COLING 2020 • Minghao Li, Yiheng Xu, Lei Cui, Shaohan Huang, Furu Wei, Zhoujun Li, Ming Zhou
DocBank is constructed using a simple yet effective way with weak supervision from the \LaTeX{} documents available on the arXiv. com.
1 code implementation • LREC 2020 • Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou, Zhoujun Li
We present TableBank, a new image-based table detection and recognition dataset built with novel weak supervision from Word and Latex documents on the internet.
no code implementations • 7 Feb 2020 • Chaoqun Duan, Lei Cui, Shuming Ma, Furu Wei, Conghui Zhu, Tiejun Zhao
In this work, we aim to improve the relevance between live comments and videos by modeling the cross-modal interactions among different modalities.
13 code implementations • 31 Dec 2019 • Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou
In this paper, we propose the \textbf{LayoutLM} to jointly model interactions between text and layout information across scanned document images, which is beneficial for a great number of real-world document image understanding tasks such as information extraction from scanned documents.
Ranked #12 on
Document Image Classification
on RVL-CDIP
no code implementations • 17 Dec 2019 • Renchun You, Zhiyao Guo, Lei Cui, Xiang Long, Yingze Bao, Shilei Wen
In order to overcome these challenges, we propose to use cross-modality attention with semantic graph embedding for multi label classification.
Ranked #8 on
Multi-Label Classification
on NUS-WIDE
no code implementations • WS 2019 • Hangbo Bao, Li Dong, Furu Wei, Wenhui Wang, Nan Yang, Lei Cui, Songhao Piao, Ming Zhou
Most machine reading comprehension (MRC) models separately handle encoding and matching with different network architectures.
2 code implementations • LREC 2020 • Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou, Zhoujun Li
We present TableBank, a new image-based table detection and recognition dataset built with novel weak supervision from Word and Latex documents on the internet.
no code implementations • EMNLP 2018 • Tao Ge, Qing Dou, Heng Ji, Lei Cui, Baobao Chang, Zhifang Sui, Furu Wei, Ming Zhou
This paper proposes to study fine-grained coordinated cross-lingual text stream alignment through a novel information network decipherment paradigm.
no code implementations • 13 Sep 2018 • Shuming Ma, Lei Cui, Furu Wei, Xu sun
To fully exploit the unpaired data, we completely remove the need for parallel data and propose a novel unsupervised approach to train an automatic article commenting model, relying on nothing but unpaired articles and comments.
3 code implementations • 13 Sep 2018 • Shuming Ma, Lei Cui, Damai Dai, Furu Wei, Xu sun
We introduce the task of automatic live commenting.
no code implementations • 12 Sep 2018 • Hangbo Bao, Shaohan Huang, Furu Wei, Lei Cui, Yu Wu, Chuanqi Tan, Songhao Piao, Ming Zhou
In this paper, we study a novel task that learns to compose music from natural language.
no code implementations • ACL 2019 • Qingfu Zhu, Lei Cui, Wei-Nan Zhang, Furu Wei, Ting Liu
Dialogue systems are usually built on either generation-based or retrieval-based approaches, yet they do not benefit from the advantages of different models.
no code implementations • ACL 2018 • Lei Cui, Furu Wei, Ming Zhou
Conventional Open Information Extraction (Open IE) systems are usually built on hand-crafted patterns from other NLP tools such as syntactic parsing, yet they face problems of error propagation.
no code implementations • COLING 2016 • Tao Ge, Lei Cui, Baobao Chang, Zhifang Sui, Ming Zhou
Retrospective event detection is an important task for discovering previously unidentified events in a text stream.
no code implementations • 27 Sep 2016 • Tao Ge, Qing Dou, Xiaoman Pan, Heng Ji, Lei Cui, Baobao Chang, Zhifang Sui, Ming Zhou
We introduce a novel Burst Information Network (BINet) representation that can display the most important information and illustrate the connections among bursty entities, events and keywords in the corpus.