1 code implementation • 22 Jan 2024 • Jiawei Wang, Kai Hu, Zhuoyao Zhong, Lei Sun, Qiang Huo
Our end-to-end system achieves state-of-the-art performance on two large-scale document layout analysis datasets (PubLayNet and DocLayNet), a high-quality hierarchical document structure reconstruction dataset (HRDoc), and our Comp-HRDoc benchmark.
no code implementations • 17 Jan 2024 • Jiawei Wang, Shunchi Zhang, Kai Hu, Chixiang Ma, Zhuoyao Zhong, Lei Sun, Qiang Huo
Contextual Text Block Detection (CTBD) is the task of identifying coherent text blocks within the complexity of natural scenes.
no code implementations • 17 Jan 2024 • Kai Hu, Jiawei Wang, WeiHong Lin, Zhuoyao Zhong, Lei Sun, Qiang Huo
This unified approach allows for the definition of various relation types and effectively tackles hierarchical relationships in form-like documents.
1 code implementation • ICCV 2023 • Frederic Z. Zhang, Yuhui Yuan, Dylan Campbell, Zhuoyao Zhong, Stephen Gould
Recently, the DETR framework has emerged as the dominant approach for human--object interaction (HOI) research.
Ranked #2 on Human-Object Interaction Detection on HICO-DET
no code implementations • 17 Apr 2023 • Kai Hu, Zhuoyuan Wu, Zhuoyao Zhong, WeiHong Lin, Lei Sun, Qiang Huo
In this paper, we present a new question-answering (QA) based key-value pair extraction approach, called KVPFormer, to robustly extracting key-value relationships between entities from form-like document images.
no code implementations • 25 May 2021 • WeiHong Lin, Qifang Gao, Lei Sun, Zhuoyao Zhong, Kai Hu, Qin Ren, Qiang Huo
In this paper, we propose a new multi-modal backbone network by concatenating a BERTgrid to an intermediate layer of a CNN model, where the input of CNN is a document image and the BERTgrid is a grid of word embeddings, to generate a more powerful grid-based document representation, named ViBERTgrid.
no code implementations • 16 Mar 2020 • Chixiang Ma, Lei Sun, Zhuoyao Zhong, Qiang Huo
The key idea is to decompose text detection into two subproblems, namely detection of text primitives and prediction of link relationships between nearby text primitive pairs.
no code implementations • 22 Nov 2018 • Zhida Huang, Zhuoyao Zhong, Lei Sun, Qiang Huo
In this paper, we present a new Mask R-CNN based text detection approach which can robustly detect multi-oriented and curved text from natural scene images in a unified manner.
Ranked #6 on Scene Text Detection on SCUT-CTW1500
no code implementations • 24 Apr 2018 • Zhuoyao Zhong, Lei Sun, Qiang Huo
The anchor mechanism of Faster R-CNN and SSD framework is considered not effective enough to scene text detection, which can be attributed to its IoU based matching criterion between anchors and ground-truth boxes.
5 code implementations • 24 May 2016 • Zhuoyao Zhong, Lianwen Jin, Shuye Zhang, Ziyong Feng
In this paper, we develop a novel unified framework called DeepText for text region proposal generation and text detection in natural images via a fully convolutional neural network (CNN).
1 code implementation • 19 May 2015 • Zhuoyao Zhong, Lianwen Jin, Zecheng Xie
We design a streamlined version of GoogLeNet [13], which was original proposed for image classification in recent years with very deep architecture, for HCCR (denoted as HCCR-GoogLeNet).
Image Classification Offline Handwritten Chinese Character Recognition