Search Results for author: Qi Zheng

Found 25 papers, 12 papers with code

Merge and Recognize: A Geometry and 2D Context Aware Graph Model for Named Entity Recognition from Visual Documents

no code implementations • COLING (TextGraphs) 2020 • Chuwei Luo, Yongpan Wang, Qi Zheng, Liangchen Li, Feiyu Gao, Shiyu Zhang

By incorporating geometry information from visual documents into our model, richer 2D context information is generated to improve document representations.

document understanding Language Modelling +3

Paper
Add Code

Understanding Gender Bias in Knowledge Base Embeddings

no code implementations • ACL 2022 • Yupei Du, Qi Zheng, Yuanbin Wu, Man Lan, Yan Yang, Meirong Ma

To exemplify the potential applications of our study, we also present two strategies (by adding and removing KB triples) to mitigate gender biases in KB embeddings.

Paper
Add Code

An End-to-End OCR Text Re-organization Sequence Learning for Rich-text Detail Image Comprehension

no code implementations • ECCV 2020 • Liangcheng Li, Feiyu Gao, Jiajun Bu, Yongpan Wang, Zhi Yu, Qi Zheng

Nowadays rich description on detail images help users know more about the commodities.

Image Comprehension Optical Character Recognition (OCR)

Paper
Add Code

LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding

3 code implementations • 8 Apr 2024 • Chuwei Luo, Yufan Shen, Zhaoqing Zhu, Qi Zheng, Zhi Yu, Cong Yao

The core of LayoutLLM is a layout instruction tuning strategy, which is specially designed to enhance the comprehension and utilization of document layouts.

document understanding

930

Paper
Code

LORE++: Logical Location Regression Network for Table Structure Recognition with Pre-training

no code implementations • 3 Jan 2024 • Rujiao Long, Hangdi Xing, Zhibo Yang, Qi Zheng, Zhi Yu, Cong Yao, Fei Huang

We model TSR as a logical location regression problem and propose a new TSR framework called LORE, standing for LOgical location REgression network, which for the first time regresses logical location as well as spatial location of table cells in a unified network.

regression

Paper
Add Code

Vision Grid Transformer for Document Layout Analysis

1 code implementation • ICCV 2023 • Cheng Da, Chuwei Luo, Qi Zheng, Cong Yao

Document pre-trained models and grid-based models have proven to be very effective on various tasks in Document AI.

Ranked #1 on Document Layout Analysis on PubLayNet val

Document Layout Analysis document understanding +1

930

Paper
Code

LISTER: Neighbor Decoding for Length-Insensitive Scene Text Recognition

1 code implementation • ICCV 2023 • Changxu Cheng, Peng Wang, Cheng Da, Qi Zheng, Cong Yao

The diversity in length constitutes a significant characteristic of text.

Scene Text Recognition

930

Paper
Code

GeoLayoutLM: Geometric Pre-training for Visual Information Extraction

1 code implementation • CVPR 2023 • Chuwei Luo, Changxu Cheng, Qi Zheng, Cong Yao

Additionally, novel relation heads, which are pre-trained by the geometric pre-training tasks and fine-tuned for RE, are elaborately designed to enrich and enhance the feature representation.

Ranked #1 on Key Information Extraction on CORD

Document AI entity_extraction +3

930

Paper
Code

LORE: Logical Location Regression Network for Table Structure Recognition

1 code implementation • 7 Mar 2023 • Hangdi Xing, Feiyu Gao, Rujiao Long, Jiajun Bu, Qi Zheng, Liangcheng Li, Cong Yao, Zhi Yu

Table structure recognition (TSR) aims at extracting tables in images into machine-understandable formats.

regression Table Recognition

930

Paper
Code

ESceme: Vision-and-Language Navigation with Episodic Scene Memory

1 code implementation • 2 Mar 2023 • Qi Zheng, Daqing Liu, Chaoyue Wang, Jing Zhang, Dadong Wang, DaCheng Tao

Vision-and-language navigation (VLN) simulates a visual agent that follows natural-language navigation instructions in real-world scenes.

Vision and Language Navigation

Paper
Code

Modeling Video As Stochastic Processes for Fine-Grained Video Representation Learning

1 code implementation • CVPR 2023 • Heng Zhang, Daqing Liu, Qi Zheng, Bing Su

Specifically, we enforce the embeddings of the frame sequence of interest to approximate a goal-oriented stochastic process, i. e., Brownian bridge, in the latent space via a process-based contrastive loss.

Contrastive Learning Representation Learning +3

Paper
Code

Cross-Modal Contrastive Learning for Robust Reasoning in VQA

1 code implementation • 21 Nov 2022 • Qi Zheng, Chaoyue Wang, Daqing Liu, Dadong Wang, DaCheng Tao

For each positive pair, we regard the images from different graphs as negative samples and deduct the version of multi-positive contrastive learning.

Contrastive Learning Question Answering +1

Paper
Code

Bi-VLDoc: Bidirectional Vision-Language Modeling for Visually-Rich Document Understanding

no code implementations • 27 Jun 2022 • Chuwei Luo, Guozhi Tang, Qi Zheng, Cong Yao, Lianwen Jin, Chenliang Li, Yang Xue, Luo Si

Multi-modal document pre-trained models have proven to be very effective in a variety of visually-rich document understanding (VrDU) tasks.

Document Classification document understanding +2

Paper
Add Code

Bypass Network for Semantics Driven Image Paragraph Captioning

no code implementations • 21 Jun 2022 • Qi Zheng, Chaoyue Wang, Dadong Wang

Most existing methods model the coherence through the topic transition that dynamically infers a topic vector from preceding sentences.

Image Paragraph Captioning Sentence

Paper
Add Code

Visual Superordinate Abstraction for Robust Concept Learning

no code implementations • 28 May 2022 • Qi Zheng, Chaoyue Wang, Dadong Wang, DaCheng Tao

Concept learning constructs visual representations that are connected to linguistic semantics, which is fundamental to vision-language tasks.

Attribute Question Answering +1

Paper
Add Code

FAVER: Blind Quality Prediction of Variable Frame Rate Videos

1 code implementation • 5 Jan 2022 • Qi Zheng, Zhengzhong Tu, Pavan C. Madhusudana, Xiaoyang Zeng, Alan C. Bovik, Yibo Fan

Video quality assessment (VQA) remains an important and challenging problem that affects many applications at the widest scales.

Cloud Computing Video Quality Assessment +1

Paper
Code

Decoupling Visual-Semantic Feature Learning for Robust Scene Text Recognition

no code implementations • 24 Nov 2021 • Changxu Cheng, Bohan Li, Qi Zheng, Yongpan Wang, Wenyu Liu

As a result, the learning of semantic features is prone to have a bias on the limited vocabulary of the training set, which is called vocabulary reliance.

Scene Text Recognition

Paper
Add Code

SentiPrompt: Sentiment Knowledge Enhanced Prompt-Tuning for Aspect-Based Sentiment Analysis

no code implementations • 17 Sep 2021 • Chengxi Li, Feiyu Gao, Jiajun Bu, Lu Xu, Xiang Chen, Yu Gu, Zirui Shao, Qi Zheng, Ningyu Zhang, Yongpan Wang, Zhi Yu

We inject sentiment knowledge regarding aspects, opinions, and polarities into prompt and explicitly model term relations via constructing consistency and polarity judgment templates from the ground truth triplets.

Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +3

Paper
Add Code

Inference for High Dimensional Censored Quantile Regression

1 code implementation • 22 Jul 2021 • Zhe Fei, Qi Zheng, Hyokyoung G. Hong, Yi Li

To our knowledge, there is little work available to draw inference on the effects of high dimensional predictors for censored quantile regression.

Epidemiology regression +2

Paper
Code

Pouring Dynamics Estimation Using Gated Recurrent Units

no code implementations • 8 May 2021 • Qi Zheng

One of the most commonly performed manipulation in a human's daily life is pouring.

Paper
Add Code

Classifying States of Cooking Objects Using Convolutional Neural Network

no code implementations • 30 Apr 2021 • Qi Zheng

Automated cooking machine is a goal for the future.

Paper
Add Code

Progressive Localization Networks for Language-based Moment Localization

no code implementations • 2 Feb 2021 • Qi Zheng, Jianfeng Dong, Xiaoye Qu, Xun Yang, Yabing Wang, Pan Zhou, Baolong Liu, Xun Wang

The language-based setting of this task allows for an open set of target activities, resulting in a large variation of the temporal lengths of video moments.

Paper
Add Code

Syntax-Aware Action Targeting for Video Captioning

1 code implementation • CVPR 2020 • Qi Zheng, Chaoyue Wang, Dacheng Tao

Existing methods on video captioning have made great efforts to identify objects/instances in videos, but few of them emphasize the prediction of action.

Video Captioning

Paper
Code

Hierarchical Bilinear Pooling for Fine-Grained Visual Recognition

2 code implementations • ECCV 2018 • Chaojian Yu, Xinyi Zhao, Qi Zheng, Peng Zhang, Xinge You

Fine-grained visual recognition is challenging because it highly relies on the modeling of various semantic parts and fine-grained feature learning.

Fine-Grained Visual Recognition

103

Paper
Code

Coarse-to-Fine Salient Object Detection with Low-Rank Matrix Recovery

no code implementations • 21 May 2018 • Qi Zheng, Shujian Yu, Xinge You, Qinmu Peng

Low-Rank Matrix Recovery (LRMR) has recently been applied to saliency detection by decomposing image features into a low-rank component associated with background and a sparse component associated with visual salient regions.

object-detection RGB Salient Object Detection +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.