Search Results for author: Tengchao Lv

Found 19 papers, 9 papers with code

XFUND: A Benchmark Dataset for Multilingual Visually Rich Form Understanding

no code implementations Findings (ACL) 2022 Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei

Multimodal pre-training with text, layout, and image has achieved SOTA performance for visually rich document understanding tasks recently, which demonstrates the great potential for joint learning across different modalities.

document understanding Form

Think Only When You Need with Large Hybrid-Reasoning Models

no code implementations20 May 2025 Lingjie Jiang, Xun Wu, Shaohan Huang, Qingxiu Dong, Zewen Chi, Li Dong, Xingxing Zhang, Tengchao Lv, Lei Cui, Furu Wei

Furthermore, we introduce a metric called Hybrid Accuracy to quantitatively assess the model's capability for hybrid thinking.

PEACE: Empowering Geologic Map Holistic Understanding with MLLMs

no code implementations CVPR 2025 Yangyu Huang, Tianyi Gao, Haoran Xu, QiHao Zhao, Yang song, Zhipeng Gui, Tengchao Lv, Hao Chen, Lei Cui, Scarlett Li, Furu Wei

Geologic map, as a fundamental diagram in geology science, provides critical insights into the structure and composition of Earth's subsurface and surface.

Question Answering

MMLU-CF: A Contamination-free Multi-task Language Understanding Benchmark

1 code implementation19 Dec 2024 QiHao Zhao, Yangyu Huang, Tengchao Lv, Lei Cui, Qinzheng Sun, Shaoguang Mao, Xin Zhang, Ying Xin, Qiufeng Yin, Scarlett Li, Furu Wei

This benchmark reassesses LLMs' understanding of world knowledge by averting both unintentional and malicious data leakage.

MMLU Multiple-choice +2

RedStone: Curating General, Code, Math, and QA Data for Large Language Models

no code implementations4 Dec 2024 Yaoyao Chang, Lei Cui, Li Dong, Shaohan Huang, Yangyu Huang, Yupan Huang, Scarlett Li, Tengchao Lv, Shuming Ma, Qinzheng Sun, Wenhui Wang, Furu Wei, Ying Xin, Mao Yang, Qiufeng Yin, Xingxing Zhang

This study explores the untapped potential of Common Crawl as a comprehensive and flexible resource for pre-training LLMs, addressing both general-purpose language understanding and specialized domain knowledge.

Domain Adaptation Math +1

TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering

no code implementations28 Nov 2023 Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, Furu Wei

The diffusion model has been proven a powerful generative model in recent years, yet remains a challenge in generating visual text.

Diversity Image Generation +4

TextDiffuser: Diffusion Models as Text Painters

no code implementations NeurIPS 2023 Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, Furu Wei

Diffusion models have gained increasing attention for their impressive generation abilities but currently struggle with rendering accurate and coherent text.

Optical Character Recognition (OCR)

LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking

3 code implementations18 Apr 2022 Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei

In this paper, we propose \textbf{LayoutLMv3} to pre-train multimodal Transformers for Document AI with unified text and image masking.

cross-modal alignment Document AI +15

DiT: Self-supervised Pre-training for Document Image Transformer

4 code implementations4 Mar 2022 Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei

We leverage DiT as the backbone network in a variety of vision-based Document AI tasks, including document image classification, document layout analysis, table detection as well as text detection for OCR.

Document AI document-image-classification +6

Document AI: Benchmarks, Models and Applications

no code implementations16 Nov 2021 Lei Cui, Yiheng Xu, Tengchao Lv, Furu Wei

Document AI, or Document Intelligence, is a relatively new research topic that refers to the techniques for automatically reading, understanding, and analyzing business documents.

Deep Learning Document AI +6

LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding

6 code implementations18 Apr 2021 Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei

In this paper, we present LayoutXLM, a multimodal pre-trained model for multilingual document understanding, which aims to bridge the language barriers for visually-rich document understanding.

Document Image Classification document understanding +2

LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding

8 code implementations ACL 2021 Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou

Pre-training of text and layout has proved effective in a variety of visually-rich document understanding tasks due to its effective model architecture and the advantage of large-scale unlabeled scanned/digital-born documents.

Document Image Classification Document Layout Analysis +7

Hierarchical Attention Prototypical Networks for Few-Shot Text Classification

no code implementations IJCNLP 2019 Shengli Sun, Qingfeng Sun, Kevin Zhou, Tengchao Lv

Most of the current effective methods for text classification tasks are based on large-scale labeled data and a great number of parameters, but when the supervised training data are few and difficult to be collected, these models are not available.

Few-Shot Text Classification General Classification +1

Cannot find the paper you are looking for? You can Submit a new open access paper.