Search Results for author: Zhuoyao Zhong

Found 11 papers, 4 papers with code

Detect-Order-Construct: A Tree Construction based Approach for Hierarchical Document Structure Analysis

1 code implementation22 Jan 2024 Jiawei Wang, Kai Hu, Zhuoyao Zhong, Lei Sun, Qiang Huo

Our end-to-end system achieves state-of-the-art performance on two large-scale document layout analysis datasets (PubLayNet and DocLayNet), a high-quality hierarchical document structure reconstruction dataset (HRDoc), and our Comp-HRDoc benchmark.

Document Layout Analysis Document Summarization +4

Dynamic Relation Transformer for Contextual Text Block Detection

no code implementations17 Jan 2024 Jiawei Wang, Shunchi Zhang, Kai Hu, Chixiang Ma, Zhuoyao Zhong, Lei Sun, Qiang Huo

Contextual Text Block Detection (CTBD) is the task of identifying coherent text blocks within the complexity of natural scenes.

Graph Generation Relation +1

UniVIE: A Unified Label Space Approach to Visual Information Extraction from Form-like Documents

no code implementations17 Jan 2024 Kai Hu, Jiawei Wang, WeiHong Lin, Zhuoyao Zhong, Lei Sun, Qiang Huo

This unified approach allows for the definition of various relation types and effectively tackles hierarchical relationships in form-like documents.

Key Information Extraction Relation

A Question-Answering Approach to Key Value Pair Extraction from Form-like Document Images

no code implementations17 Apr 2023 Kai Hu, Zhuoyuan Wu, Zhuoyao Zhong, WeiHong Lin, Lei Sun, Qiang Huo

In this paper, we present a new question-answering (QA) based key-value pair extraction approach, called KVPFormer, to robustly extracting key-value relationships between entities from form-like document images.

Question Answering

ViBERTgrid: A Jointly Trained Multi-Modal 2D Document Representation for Key Information Extraction from Documents

no code implementations25 May 2021 WeiHong Lin, Qifang Gao, Lei Sun, Zhuoyao Zhong, Kai Hu, Qin Ren, Qiang Huo

In this paper, we propose a new multi-modal backbone network by concatenating a BERTgrid to an intermediate layer of a CNN model, where the input of CNN is a document image and the BERTgrid is a grid of word embeddings, to generate a more powerful grid-based document representation, named ViBERTgrid.

Image Segmentation Key Information Extraction +4

ReLaText: Exploiting Visual Relationships for Arbitrary-Shaped Scene Text Detection with Graph Convolutional Networks

no code implementations16 Mar 2020 Chixiang Ma, Lei Sun, Zhuoyao Zhong, Qiang Huo

The key idea is to decompose text detection into two subproblems, namely detection of text primitives and prediction of link relationships between nearby text primitive pairs.

Link Prediction Region Proposal +4

Mask R-CNN with Pyramid Attention Network for Scene Text Detection

no code implementations22 Nov 2018 Zhida Huang, Zhuoyao Zhong, Lei Sun, Qiang Huo

In this paper, we present a new Mask R-CNN based text detection approach which can robustly detect multi-oriented and curved text from natural scene images in a unified manner.

Curved Text Detection Text Detection

An Anchor-Free Region Proposal Network for Faster R-CNN based Text Detection Approaches

no code implementations24 Apr 2018 Zhuoyao Zhong, Lei Sun, Qiang Huo

The anchor mechanism of Faster R-CNN and SSD framework is considered not effective enough to scene text detection, which can be attributed to its IoU based matching criterion between anchors and ground-truth boxes.

Region Proposal Scene Text Detection +1

DeepText: A Unified Framework for Text Proposal Generation and Text Detection in Natural Images

5 code implementations24 May 2016 Zhuoyao Zhong, Lianwen Jin, Shuye Zhang, Ziyong Feng

In this paper, we develop a novel unified framework called DeepText for text region proposal generation and text detection in natural images via a fully convolutional neural network (CNN).

Region Proposal Text Classification +1

High Performance Offline Handwritten Chinese Character Recognition Using GoogLeNet and Directional Feature Maps

1 code implementation19 May 2015 Zhuoyao Zhong, Lianwen Jin, Zecheng Xie

We design a streamlined version of GoogLeNet [13], which was original proposed for image classification in recent years with very deep architecture, for HCCR (denoted as HCCR-GoogLeNet).

Image Classification Offline Handwritten Chinese Character Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.