Search Results for author: Zhuoyao Zhong

Found 11 papers, 4 papers with code

Detect-Order-Construct: A Tree Construction based Approach for Hierarchical Document Structure Analysis

1 code implementation • 22 Jan 2024 • Jiawei Wang, Kai Hu, Zhuoyao Zhong, Lei Sun, Qiang Huo

Our end-to-end system achieves state-of-the-art performance on two large-scale document layout analysis datasets (PubLayNet and DocLayNet), a high-quality hierarchical document structure reconstruction dataset (HRDoc), and our Comp-HRDoc benchmark.

Document Layout Analysis Document Summarization +4

Paper
Code

Dynamic Relation Transformer for Contextual Text Block Detection

no code implementations • 17 Jan 2024 • Jiawei Wang, Shunchi Zhang, Kai Hu, Chixiang Ma, Zhuoyao Zhong, Lei Sun, Qiang Huo

Contextual Text Block Detection (CTBD) is the task of identifying coherent text blocks within the complexity of natural scenes.

Graph Generation Relation +1

Paper
Add Code

UniVIE: A Unified Label Space Approach to Visual Information Extraction from Form-like Documents

no code implementations • 17 Jan 2024 • Kai Hu, Jiawei Wang, WeiHong Lin, Zhuoyao Zhong, Lei Sun, Qiang Huo

This unified approach allows for the definition of various relation types and effectively tackles hierarchical relationships in form-like documents.

Key Information Extraction Relation

Paper
Add Code

Exploring Predicate Visual Context in Detecting Human-Object Interactions

1 code implementation • ICCV 2023 • Frederic Z. Zhang, Yuhui Yuan, Dylan Campbell, Zhuoyao Zhong, Stephen Gould

Recently, the DETR framework has emerged as the dominant approach for human--object interaction (HOI) research.

Ranked #2 on Human-Object Interaction Detection on HICO-DET

Human-Object Interaction Detection Object

Paper
Code

A Question-Answering Approach to Key Value Pair Extraction from Form-like Document Images

no code implementations • 17 Apr 2023 • Kai Hu, Zhuoyuan Wu, Zhuoyao Zhong, WeiHong Lin, Lei Sun, Qiang Huo

In this paper, we present a new question-answering (QA) based key-value pair extraction approach, called KVPFormer, to robustly extracting key-value relationships between entities from form-like document images.

Question Answering

Paper
Add Code

ViBERTgrid: A Jointly Trained Multi-Modal 2D Document Representation for Key Information Extraction from Documents

no code implementations • 25 May 2021 • WeiHong Lin, Qifang Gao, Lei Sun, Zhuoyao Zhong, Kai Hu, Qin Ren, Qiang Huo

In this paper, we propose a new multi-modal backbone network by concatenating a BERTgrid to an intermediate layer of a CNN model, where the input of CNN is a document image and the BERTgrid is a grid of word embeddings, to generate a more powerful grid-based document representation, named ViBERTgrid.

Image Segmentation Key Information Extraction +4

Paper
Add Code

ReLaText: Exploiting Visual Relationships for Arbitrary-Shaped Scene Text Detection with Graph Convolutional Networks

no code implementations • 16 Mar 2020 • Chixiang Ma, Lei Sun, Zhuoyao Zhong, Qiang Huo

The key idea is to decompose text detection into two subproblems, namely detection of text primitives and prediction of link relationships between nearby text primitive pairs.

Link Prediction Region Proposal +4

Paper
Add Code

Mask R-CNN with Pyramid Attention Network for Scene Text Detection

no code implementations • 22 Nov 2018 • Zhida Huang, Zhuoyao Zhong, Lei Sun, Qiang Huo

In this paper, we present a new Mask R-CNN based text detection approach which can robustly detect multi-oriented and curved text from natural scene images in a unified manner.

Ranked #6 on Scene Text Detection on SCUT-CTW1500

Curved Text Detection Text Detection

Paper
Add Code

An Anchor-Free Region Proposal Network for Faster R-CNN based Text Detection Approaches

no code implementations • 24 Apr 2018 • Zhuoyao Zhong, Lei Sun, Qiang Huo

The anchor mechanism of Faster R-CNN and SSD framework is considered not effective enough to scene text detection, which can be attributed to its IoU based matching criterion between anchors and ground-truth boxes.

Region Proposal Scene Text Detection +1

Paper
Add Code

DeepText: A Unified Framework for Text Proposal Generation and Text Detection in Natural Images

5 code implementations • 24 May 2016 • Zhuoyao Zhong, Lianwen Jin, Shuye Zhang, Ziyong Feng

In this paper, we develop a novel unified framework called DeepText for text region proposal generation and text detection in natural images via a fully convolutional neural network (CNN).

Region Proposal Text Classification +1

219

Paper
Code

High Performance Offline Handwritten Chinese Character Recognition Using GoogLeNet and Directional Feature Maps

1 code implementation • 19 May 2015 • Zhuoyao Zhong, Lianwen Jin, Zecheng Xie

We design a streamlined version of GoogLeNet [13], which was original proposed for image classification in recent years with very deep architecture, for HCCR (denoted as HCCR-GoogLeNet).

Image Classification Offline Handwritten Chinese Character Recognition

118

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.