Search Results for author: WeiHong Lin

Found 10 papers, 4 papers with code

UniVIE: A Unified Label Space Approach to Visual Information Extraction from Form-like Documents

no code implementations • 17 Jan 2024 • Kai Hu, Jiawei Wang, WeiHong Lin, Zhuoyao Zhong, Lei Sun, Qiang Huo

This unified approach allows for the definition of various relation types and effectively tackles hierarchical relationships in form-like documents.

Key Information Extraction Relation

Paper
Add Code

A Question-Answering Approach to Key Value Pair Extraction from Form-like Document Images

no code implementations • 17 Apr 2023 • Kai Hu, Zhuoyuan Wu, Zhuoyao Zhong, WeiHong Lin, Lei Sun, Qiang Huo

In this paper, we present a new question-answering (QA) based key-value pair extraction approach, called KVPFormer, to robustly extracting key-value relationships between entities from form-like document images.

Question Answering

Paper
Add Code

Robust Table Structure Recognition with Dynamic Queries Enhanced Detection Transformer

no code implementations • 21 Mar 2023 • Jiawei Wang, WeiHong Lin, Chixiang Ma, Mingze Li, Zheng Sun, Lei Sun, Qiang Huo

Unlike previous methods, we formulate table separation line prediction as a line regression problem instead of an image segmentation problem and propose a new two-stage dynamic queries enhanced DETR based separation line regression approach, named DQ-DETR, to predict separation lines from table images directly.

Image Segmentation regression +2

Paper
Add Code

Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning

4 code implementations • 3 Oct 2022 • Weicong Liang, Yuhui Yuan, Henghui Ding, Xiao Luo, WeiHong Lin, Ding Jia, Zheng Zhang, Chao Zhang, Han Hu

Vision transformers have recently achieved competitive results across various vision tasks but still suffer from heavy computation costs when processing a large number of tokens.

Clustering Depth Estimation +6

Paper
Code

TSRFormer: Table Structure Recognition with Transformers

no code implementations • 9 Aug 2022 • WeiHong Lin, Zheng Sun, Chixiang Ma, Mingze Li, Jiawei Wang, Lei Sun, Qiang Huo

We present a new table structure recognition (TSR) approach, called TSRFormer, to robustly recognizing the structures of complex tables with geometrical distortions from various table images.

Ranked #2 on Table Recognition on PubTabNet (TEDS-Struct metric)

Image Segmentation Relation Network +2

Paper
Add Code

DETRs with Hybrid Matching

8 code implementations • CVPR 2023 • Ding Jia, Yuhui Yuan, Haodi He, Xiaopei Wu, Haojun Yu, WeiHong Lin, Lei Sun, Chao Zhang, Han Hu

One-to-one set matching is a key design for DETR to establish its end-to-end capability, so that object detection does not require a hand-crafted NMS (non-maximum suppression) to remove duplicate detections.

Object Detection Pose Estimation +2

1,816

Paper
Code

Robust Table Detection and Structure Recognition from Heterogeneous Document Images

no code implementations • 17 Mar 2022 • Chixiang Ma, WeiHong Lin, Lei Sun, Qiang Huo

We introduce a new table detection and structure recognition approach named RobusTabNet to detect the boundaries of tables and reconstruct the cellular structure of each table from heterogeneous document images.

Ranked #5 on Table Recognition on PubTabNet (TEDS-Struct metric)

Region Proposal Table Detection +1

Paper
Add Code

HRFormer: High-Resolution Vision Transformer for Dense Predict

2 code implementations • NeurIPS 2021 • Yuhui Yuan, Rao Fu, Lang Huang, WeiHong Lin, Chao Zhang, Xilin Chen, Jingdong Wang

We present a High-Resolution Transformer (HRFormer) that learns high-resolution representations for dense prediction tasks, in contrast to the original Vision Transformer that produces low-resolution representations and has high memory and computational cost.

Pose Estimation Semantic Segmentation +1

474

Paper
Code

HRFormer: High-Resolution Transformer for Dense Prediction

1 code implementation • 18 Oct 2021 • Yuhui Yuan, Rao Fu, Lang Huang, WeiHong Lin, Chao Zhang, Xilin Chen, Jingdong Wang

Ranked #3 on Pose Estimation on AIC

Image Classification Multi-Person Pose Estimation +2

474

Paper
Code

ViBERTgrid: A Jointly Trained Multi-Modal 2D Document Representation for Key Information Extraction from Documents

no code implementations • 25 May 2021 • WeiHong Lin, Qifang Gao, Lei Sun, Zhuoyao Zhong, Kai Hu, Qin Ren, Qiang Huo

In this paper, we propose a new multi-modal backbone network by concatenating a BERTgrid to an intermediate layer of a CNN model, where the input of CNN is a document image and the BERTgrid is a grid of word embeddings, to generate a more powerful grid-based document representation, named ViBERTgrid.

Image Segmentation Key Information Extraction +4

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.