Search Results for author: Qi Zheng

Found 36 papers, 17 papers with code

Understanding Gender Bias in Knowledge Base Embeddings

no code implementations ACL 2022 Yupei Du, Qi Zheng, Yuanbin Wu, Man Lan, Yan Yang, Meirong Ma

To exemplify the potential applications of our study, we also present two strategies (by adding and removing KB triples) to mitigate gender biases in KB embeddings.

ST-ReP: Learning Predictive Representations Efficiently for Spatial-Temporal Forecasting

no code implementations19 Dec 2024 Qi Zheng, Zihao Yao, Yaying Zhang

Spatial-temporal forecasting is crucial and widely applicable in various domains such as traffic, energy, and climate.

Contrastive Learning Representation Learning +2

Unicorn: Unified Neural Image Compression with One Number Reconstruction

no code implementations11 Dec 2024 Qi Zheng, Haozhi Wang, Zihao Liu, Jiaming Liu, Peiye Liu, Zhijian Hao, Yanheng Lu, Dimin Niu, Jinjia Zhou, Minge jing, Yibo Fan

The neural model serves as the unified decoder of images while the noises and indexes corresponds to explicit representations.

Decoder Image Compression

Video Quality Assessment: A Comprehensive Survey

1 code implementation4 Dec 2024 Qi Zheng, Yibo Fan, Leilei Huang, Tianyu Zhu, Jiaming Liu, Zhijian Hao, Shuo Xing, Chia-Ju Chen, Xiongkuo Min, Alan C. Bovik, Zhengzhong Tu

Numerous deep learning-based VQA models have been developed, with progress in this direction driven by the creation of content-diverse, large-scale human-labeled databases that supply ground truth psychometric video quality data.

Benchmarking Survey +2

M3-CVC: Controllable Video Compression with Multimodal Generative Models

no code implementations24 Nov 2024 Rui Wan, Qi Zheng, Yibo Fan

Traditional and neural video codecs commonly encounter limitations in controllability and generality under ultra-low-bitrate coding scenarios.

Video Compression

Is Cognition consistent with Perception? Assessing and Mitigating Multimodal Knowledge Conflicts in Document Understanding

no code implementations12 Nov 2024 Zirui Shao, Chuwei Luo, Zhaoqing Zhu, Hangdi Xing, Zhi Yu, Qi Zheng, Jiajun Bu

In this paper, we define the conflicts between cognition and perception as Cognition and Perception (C&P) knowledge conflicts, a form of multimodal knowledge conflicts, and systematically assess them with a focus on document understanding.

document understanding Optical Character Recognition (OCR) +1

AIM 2024 Challenge on Compressed Video Quality Assessment: Methods and Results

1 code implementation21 Aug 2024 Maksim Smirnov, Aleksandr Gushchin, Anastasia Antsiferova, Dmitry Vatolin, Radu Timofte, Ziheng Jia, ZiCheng Zhang, Wei Sun, Jiaying Qian, Yuqin Cao, Yinan Sun, Yuxin Zhu, Xiongkuo Min, Guangtao Zhai, Kanjar De, Qing Luo, Ao-Xiang Zhang, Peng Zhang, Haibo Lei, Linyan Jiang, Yaqing Li, Wenhui Meng, Zhenzhong Chen, Zhengxue Cheng, Jiahao Xiao, Jun Xu, Chenlong He, Qi Zheng, Ruoxi Zhu, Min Li, Yibo Fan, Zhengzhong Tu

The challenge aimed to evaluate the performance of VQA methods on a diverse dataset of 459 videos, encoded with 14 codecs of various compression standards (AVC/H. 264, HEVC/H. 265, AV1, and VVC/H. 266) and containing a comprehensive collection of compression artifacts.

Image Manipulation valid +3

WebRPG: Automatic Web Rendering Parameters Generation for Visual Presentation

1 code implementation22 Jul 2024 Zirui Shao, Feiyu Gao, Hangdi Xing, Zepeng Zhu, Zhi Yu, Jiajun Bu, Qi Zheng, Cong Yao

In the era of content creation revolution propelled by advancements in generative models, the field of web design remains unexplored despite its critical role in modern digital communication.

ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data

1 code implementation17 Jul 2024 Yufan Shen, Chuwei Luo, Zhaoqing Zhu, Yang Chen, Qi Zheng, Zhi Yu, Jiajun Bu, Cong Yao

An effective evaluation method for document instruction data is crucial in constructing instruction data with high efficacy, which, in turn, facilitates the training of LLMs and MLLMs for document VQA.

Question Answering Visual Question Answering

Advanced Payment Security System:XGBoost, LightGBM and SMOTE Integrated

no code implementations7 Jun 2024 Qi Zheng, Chang Yu, Jin Cao, Yongshun Xu, Qianwen Xing, Yinxin Jin

With the rise of various online and mobile payment systems, transaction fraud has become a significant threat to financial security.

LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding

2 code implementations CVPR 2024 Chuwei Luo, Yufan Shen, Zhaoqing Zhu, Qi Zheng, Zhi Yu, Cong Yao

The core of LayoutLLM is a layout instruction tuning strategy, which is specially designed to enhance the comprehension and utilization of document layouts.

document understanding

LORE++: Logical Location Regression Network for Table Structure Recognition with Pre-training

no code implementations3 Jan 2024 Rujiao Long, Hangdi Xing, Zhibo Yang, Qi Zheng, Zhi Yu, Cong Yao, Fei Huang

We model TSR as a logical location regression problem and propose a new TSR framework called LORE, standing for LOgical location REgression network, which for the first time regresses logical location as well as spatial location of table cells in a unified network.

regression

GeoLayoutLM: Geometric Pre-training for Visual Information Extraction

1 code implementation CVPR 2023 Chuwei Luo, Changxu Cheng, Qi Zheng, Cong Yao

Additionally, novel relation heads, which are pre-trained by the geometric pre-training tasks and fine-tuned for RE, are elaborately designed to enrich and enhance the feature representation.

Document AI entity_extraction +4

LORE: Logical Location Regression Network for Table Structure Recognition

2 code implementations7 Mar 2023 Hangdi Xing, Feiyu Gao, Rujiao Long, Jiajun Bu, Qi Zheng, Liangcheng Li, Cong Yao, Zhi Yu

Table structure recognition (TSR) aims at extracting tables in images into machine-understandable formats.

regression Table Recognition

ESceme: Vision-and-Language Navigation with Episodic Scene Memory

1 code implementation2 Mar 2023 Qi Zheng, Daqing Liu, Chaoyue Wang, Jing Zhang, Dadong Wang, DaCheng Tao

In this work, we introduce a mechanism of Episodic Scene memory (ESceme) for VLN that wakes an agent's memories of past visits when it enters the current scene.

Vision and Language Navigation

Modeling Video As Stochastic Processes for Fine-Grained Video Representation Learning

1 code implementation CVPR 2023 Heng Zhang, Daqing Liu, Qi Zheng, Bing Su

Specifically, we enforce the embeddings of the frame sequence of interest to approximate a goal-oriented stochastic process, i. e., Brownian bridge, in the latent space via a process-based contrastive loss.

Contrastive Learning Representation Learning +3

Cross-Modal Contrastive Learning for Robust Reasoning in VQA

1 code implementation21 Nov 2022 Qi Zheng, Chaoyue Wang, Daqing Liu, Dadong Wang, DaCheng Tao

For each positive pair, we regard the images from different graphs as negative samples and deduct the version of multi-positive contrastive learning.

Contrastive Learning Question Answering +2

Bi-VLDoc: Bidirectional Vision-Language Modeling for Visually-Rich Document Understanding

no code implementations27 Jun 2022 Chuwei Luo, Guozhi Tang, Qi Zheng, Cong Yao, Lianwen Jin, Chenliang Li, Yang Xue, Luo Si

Multi-modal document pre-trained models have proven to be very effective in a variety of visually-rich document understanding (VrDU) tasks.

Document Classification document understanding +3

Bypass Network for Semantics Driven Image Paragraph Captioning

no code implementations21 Jun 2022 Qi Zheng, Chaoyue Wang, Dadong Wang

Most existing methods model the coherence through the topic transition that dynamically infers a topic vector from preceding sentences.

Image Paragraph Captioning Sentence

Visual Superordinate Abstraction for Robust Concept Learning

no code implementations28 May 2022 Qi Zheng, Chaoyue Wang, Dadong Wang, DaCheng Tao

Concept learning constructs visual representations that are connected to linguistic semantics, which is fundamental to vision-language tasks.

Attribute Question Answering +1

FAVER: Blind Quality Prediction of Variable Frame Rate Videos

1 code implementation5 Jan 2022 Qi Zheng, Zhengzhong Tu, Pavan C. Madhusudana, Xiaoyang Zeng, Alan C. Bovik, Yibo Fan

Video quality assessment (VQA) remains an important and challenging problem that affects many applications at the widest scales.

Cloud Computing Video Quality Assessment +1

Decoupling Visual-Semantic Feature Learning for Robust Scene Text Recognition

no code implementations24 Nov 2021 Changxu Cheng, Bohan Li, Qi Zheng, Yongpan Wang, Wenyu Liu

As a result, the learning of semantic features is prone to have a bias on the limited vocabulary of the training set, which is called vocabulary reliance.

Decoder Scene Text Recognition

SentiPrompt: Sentiment Knowledge Enhanced Prompt-Tuning for Aspect-Based Sentiment Analysis

no code implementations17 Sep 2021 Chengxi Li, Feiyu Gao, Jiajun Bu, Lu Xu, Xiang Chen, Yu Gu, Zirui Shao, Qi Zheng, Ningyu Zhang, Yongpan Wang, Zhi Yu

We inject sentiment knowledge regarding aspects, opinions, and polarities into prompt and explicitly model term relations via constructing consistency and polarity judgment templates from the ground truth triplets.

Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +5

Inference for High Dimensional Censored Quantile Regression

1 code implementation22 Jul 2021 Zhe Fei, Qi Zheng, Hyokyoung G. Hong, Yi Li

To our knowledge, there is little work available to draw inference on the effects of high dimensional predictors for censored quantile regression.

Epidemiology quantile regression +2

Pouring Dynamics Estimation Using Gated Recurrent Units

no code implementations8 May 2021 Qi Zheng

One of the most commonly performed manipulation in a human's daily life is pouring.

Progressive Localization Networks for Language-based Moment Localization

no code implementations2 Feb 2021 Qi Zheng, Jianfeng Dong, Xiaoye Qu, Xun Yang, Yabing Wang, Pan Zhou, Baolong Liu, Xun Wang

The language-based setting of this task allows for an open set of target activities, resulting in a large variation of the temporal lengths of video moments.

Syntax-Aware Action Targeting for Video Captioning

1 code implementation CVPR 2020 Qi Zheng, Chaoyue Wang, Dacheng Tao

Existing methods on video captioning have made great efforts to identify objects/instances in videos, but few of them emphasize the prediction of action.

Video Captioning

Hierarchical Bilinear Pooling for Fine-Grained Visual Recognition

2 code implementations ECCV 2018 Chaojian Yu, Xinyi Zhao, Qi Zheng, Peng Zhang, Xinge You

Fine-grained visual recognition is challenging because it highly relies on the modeling of various semantic parts and fine-grained feature learning.

Fine-Grained Visual Recognition

Coarse-to-Fine Salient Object Detection with Low-Rank Matrix Recovery

no code implementations21 May 2018 Qi Zheng, Shujian Yu, Xinge You, Qinmu Peng

Low-Rank Matrix Recovery (LRMR) has recently been applied to saliency detection by decomposing image features into a low-rank component associated with background and a sparse component associated with visual salient regions.

object-detection RGB Salient Object Detection +2

Cannot find the paper you are looking for? You can Submit a new open access paper.