Search Results for author: Yiheng Xu

Found 15 papers, 11 papers with code

XFUND: A Benchmark Dataset for Multilingual Visually Rich Form Understanding

no code implementations Findings (ACL) 2022 Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei

Multimodal pre-training with text, layout, and image has achieved SOTA performance for visually rich document understanding tasks recently, which demonstrates the great potential for joint learning across different modalities.

document understanding

MarkupLM: Pre-training of Text and Markup Language for Visually Rich Document Understanding

no code implementations ACL 2022 Junlong Li, Yiheng Xu, Lei Cui, Furu Wei

Multimodal pre-training with text, layout, and image has made significant progress for Visually Rich Document Understanding (VRDU), especially the fixed-layout documents such as scanned document images.

document understanding

OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

no code implementations11 Apr 2024 Tianbao Xie, Danyang Zhang, Jixuan Chen, Xiaochuan Li, Siheng Zhao, Ruisheng Cao, Toh Jing Hua, Zhoujun Cheng, Dongchan Shin, Fangyu Lei, Yitao Liu, Yiheng Xu, Shuyan Zhou, Silvio Savarese, Caiming Xiong, Victor Zhong, Tao Yu

Autonomous agents that accomplish complex computer tasks with minimal human interventions have the potential to transform human-computer interaction, significantly enhancing accessibility and productivity.

Benchmarking

OpenAgents: An Open Platform for Language Agents in the Wild

2 code implementations16 Oct 2023 Tianbao Xie, Fan Zhou, Zhoujun Cheng, Peng Shi, Luoxuan Weng, Yitao Liu, Toh Jing Hua, Junning Zhao, Qian Liu, Che Liu, Leo Z. Liu, Yiheng Xu, Hongjin Su, Dongchan Shin, Caiming Xiong, Tao Yu

Language agents show potential in being capable of utilizing natural language for varied and intricate tasks in diverse environments, particularly when built upon large language models (LLMs).

2D Object Detection

Lemur: Harmonizing Natural Language and Code for Language Agents

1 code implementation10 Oct 2023 Yiheng Xu, Hongjin Su, Chen Xing, Boyu Mi, Qian Liu, Weijia Shi, Binyuan Hui, Fan Zhou, Yitao Liu, Tianbao Xie, Zhoujun Cheng, Siheng Zhao, Lingpeng Kong, Bailin Wang, Caiming Xiong, Tao Yu

We introduce Lemur and Lemur-Chat, openly accessible language models optimized for both natural language and coding capabilities to serve as the backbone of versatile language agents.

In-Context Learning with Many Demonstration Examples

1 code implementation9 Feb 2023 Mukai Li, Shansan Gong, Jiangtao Feng, Yiheng Xu, Jun Zhang, Zhiyong Wu, Lingpeng Kong

Based on EVALM, we scale up the size of examples efficiently in both instruction tuning and in-context learning to explore the boundary of the benefits from more annotated data.

16k 8k +2

DiT: Self-supervised Pre-training for Document Image Transformer

3 code implementations4 Mar 2022 Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei

We leverage DiT as the backbone network in a variety of vision-based Document AI tasks, including document image classification, document layout analysis, table detection as well as text detection for OCR.

Document AI Document Image Classification +4

Document AI: Benchmarks, Models and Applications

no code implementations16 Nov 2021 Lei Cui, Yiheng Xu, Tengchao Lv, Furu Wei

Document AI, or Document Intelligence, is a relatively new research topic that refers to the techniques for automatically reading, understanding, and analyzing business documents.

Document AI Document Image Classification +3

MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding

2 code implementations16 Oct 2021 Junlong Li, Yiheng Xu, Lei Cui, Furu Wei

Multimodal pre-training with text, layout, and image has made significant progress for Visually Rich Document Understanding (VRDU), especially the fixed-layout documents such as scanned document images.

document understanding

LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding

6 code implementations18 Apr 2021 Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei

In this paper, we present LayoutXLM, a multimodal pre-trained model for multilingual document understanding, which aims to bridge the language barriers for visually-rich document understanding.

Document Image Classification document understanding

Marius: Learning Massive Graph Embeddings on a Single Machine

1 code implementation20 Jan 2021 Jason Mohoney, Roger Waleffe, Yiheng Xu, Theodoros Rekatsinas, Shivaram Venkataraman

We propose a new framework for computing the embeddings of large-scale graphs on a single machine.

Graph Embedding

LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding

5 code implementations ACL 2021 Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou

Pre-training of text and layout has proved effective in a variety of visually-rich document understanding tasks due to its effective model architecture and the advantage of large-scale unlabeled scanned/digital-born documents.

Document Image Classification Document Layout Analysis +6

DocBank: A Benchmark Dataset for Document Layout Analysis

2 code implementations COLING 2020 Minghao Li, Yiheng Xu, Lei Cui, Shaohan Huang, Furu Wei, Zhoujun Li, Ming Zhou

DocBank is constructed using a simple yet effective way with weak supervision from the \LaTeX{} documents available on the arXiv. com.

Document Layout Analysis

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

15 code implementations31 Dec 2019 Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou

In this paper, we propose the \textbf{LayoutLM} to jointly model interactions between text and layout information across scanned document images, which is beneficial for a great number of real-world document image understanding tasks such as information extraction from scanned documents.

Document AI Document Image Classification +3

Cannot find the paper you are looking for? You can Submit a new open access paper.