Search Results for author: Yijuan Lu

Found 9 papers, 5 papers with code

XFUND: A Benchmark Dataset for Multilingual Visually Rich Form Understanding

no code implementations Findings (ACL) 2022 Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei

Multimodal pre-training with text, layout, and image has achieved SOTA performance for visually rich document understanding tasks recently, which demonstrates the great potential for joint learning across different modalities.

Improving Structured Text Recognition with Regular Expression Biasing

no code implementations10 Nov 2021 Baoguang Shi, WenFeng Cheng, Yijuan Lu, Cha Zhang, Dinei Florencio

We study the problem of recognizing structured text, i. e. text that follows certain formats, and propose to improve the recognition accuracy of structured text by specifying regular expressions (regexes) for biasing.

TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models

2 code implementations21 Sep 2021 Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei

Existing approaches for text recognition are usually built based on CNN for image understanding and RNN for char-level text generation.

Handwritten Text Recognition Language Modelling +2

LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding

4 code implementations18 Apr 2021 Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei

In this paper, we present LayoutXLM, a multimodal pre-trained model for multilingual document understanding, which aims to bridge the language barriers for visually-rich document understanding.

Document Image Classification

LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding

4 code implementations ACL 2021 Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou

Pre-training of text and layout has proved effective in a variety of visually-rich document understanding tasks due to its effective model architecture and the advantage of large-scale unlabeled scanned/digital-born documents.

Document Image Classification Document Layout Analysis +4

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption

1 code implementation CVPR 2021 Zhengyuan Yang, Yijuan Lu, JianFeng Wang, Xi Yin, Dinei Florencio, Lijuan Wang, Cha Zhang, Lei Zhang, Jiebo Luo

Due to this aligned representation learning, even pre-trained on the same downstream task dataset, TAP already boosts the absolute accuracy on the TextVQA dataset by +5. 4%, compared with a non-TAP baseline.

Language Modelling Masked Language Modeling +5

Multi-Scale 2D Temporal Adjacent Networks for Moment Localization with Natural Language

1 code implementation4 Dec 2020 Songyang Zhang, Houwen Peng, Jianlong Fu, Yijuan Lu, Jiebo Luo

It is a challenging problem because a target moment may take place in the context of other temporal moments in the untrimmed video.

RGB-T Object Tracking:Benchmark and Baseline

no code implementations23 May 2018 Chenglong Li, Xinyan Liang, Yijuan Lu, Nan Zhao, Jin Tang

RGB-Thermal (RGB-T) object tracking receives more and more attention due to the strongly complementary benefits of thermal information to visible data.

Frame Object Tracking +1

Cannot find the paper you are looking for? You can Submit a new open access paper.