no code implementations • COLING (TextGraphs) 2020 • Chuwei Luo, Yongpan Wang, Qi Zheng, Liangchen Li, Feiyu Gao, Shiyu Zhang
By incorporating geometry information from visual documents into our model, richer 2D context information is generated to improve document representations.
no code implementations • ACL 2022 • Yupei Du, Qi Zheng, Yuanbin Wu, Man Lan, Yan Yang, Meirong Ma
To exemplify the potential applications of our study, we also present two strategies (by adding and removing KB triples) to mitigate gender biases in KB embeddings.
no code implementations • ECCV 2020 • Liangcheng Li, Feiyu Gao, Jiajun Bu, Yongpan Wang, Zhi Yu, Qi Zheng
Nowadays rich description on detail images help users know more about the commodities.
no code implementations • 19 Dec 2024 • Qi Zheng, Zihao Yao, Yaying Zhang
Spatial-temporal forecasting is crucial and widely applicable in various domains such as traffic, energy, and climate.
no code implementations • 11 Dec 2024 • Qi Zheng, Haozhi Wang, Zihao Liu, Jiaming Liu, Peiye Liu, Zhijian Hao, Yanheng Lu, Dimin Niu, Jinjia Zhou, Minge jing, Yibo Fan
The neural model serves as the unified decoder of images while the noises and indexes corresponds to explicit representations.
1 code implementation • 4 Dec 2024 • Qi Zheng, Yibo Fan, Leilei Huang, Tianyu Zhu, Jiaming Liu, Zhijian Hao, Shuo Xing, Chia-Ju Chen, Xiongkuo Min, Alan C. Bovik, Zhengzhong Tu
Numerous deep learning-based VQA models have been developed, with progress in this direction driven by the creation of content-diverse, large-scale human-labeled databases that supply ground truth psychometric video quality data.
no code implementations • 24 Nov 2024 • Rui Wan, Qi Zheng, Yibo Fan
Traditional and neural video codecs commonly encounter limitations in controllability and generality under ultra-low-bitrate coding scenarios.
no code implementations • 12 Nov 2024 • Zirui Shao, Chuwei Luo, Zhaoqing Zhu, Hangdi Xing, Zhi Yu, Qi Zheng, Jiajun Bu
In this paper, we define the conflicts between cognition and perception as Cognition and Perception (C&P) knowledge conflicts, a form of multimodal knowledge conflicts, and systematically assess them with a focus on document understanding.
document understanding
Optical Character Recognition (OCR)
+1
no code implementations • 30 Sep 2024 • Qianwen Xing, Chang Yu, Sining Huang, Qi Zheng, Xingyu Mu, Mengying Sun
In contemporary economic society, credit scores are crucial for every participant.
1 code implementation • 21 Aug 2024 • Maksim Smirnov, Aleksandr Gushchin, Anastasia Antsiferova, Dmitry Vatolin, Radu Timofte, Ziheng Jia, ZiCheng Zhang, Wei Sun, Jiaying Qian, Yuqin Cao, Yinan Sun, Yuxin Zhu, Xiongkuo Min, Guangtao Zhai, Kanjar De, Qing Luo, Ao-Xiang Zhang, Peng Zhang, Haibo Lei, Linyan Jiang, Yaqing Li, Wenhui Meng, Zhenzhong Chen, Zhengxue Cheng, Jiahao Xiao, Jun Xu, Chenlong He, Qi Zheng, Ruoxi Zhu, Min Li, Yibo Fan, Zhengzhong Tu
The challenge aimed to evaluate the performance of VQA methods on a diverse dataset of 459 videos, encoded with 14 codecs of various compression standards (AVC/H. 264, HEVC/H. 265, AV1, and VVC/H. 266) and containing a comprehensive collection of compression artifacts.
1 code implementation • 22 Jul 2024 • Zirui Shao, Feiyu Gao, Hangdi Xing, Zepeng Zhu, Zhi Yu, Jiajun Bu, Qi Zheng, Cong Yao
In the era of content creation revolution propelled by advancements in generative models, the field of web design remains unexplored despite its critical role in modern digital communication.
1 code implementation • 17 Jul 2024 • Yufan Shen, Chuwei Luo, Zhaoqing Zhu, Yang Chen, Qi Zheng, Zhi Yu, Jiajun Bu, Cong Yao
An effective evaluation method for document instruction data is crucial in constructing instruction data with high efficacy, which, in turn, facilitates the training of LLMs and MLLMs for document VQA.
no code implementations • 7 Jun 2024 • Qi Zheng, Chang Yu, Jin Cao, Yongshun Xu, Qianwen Xing, Yinxin Jin
With the rise of various online and mobile payment systems, transaction fraud has become a significant threat to financial security.
1 code implementation • 24 Apr 2024 • Marcos V. Conde, Saman Zadtootaghaj, Nabajeet Barman, Radu Timofte, Chenlong He, Qi Zheng, Ruoxi Zhu, Zhengzhong Tu, Haiqiang Wang, Xiangguang Chen, Wenhui Meng, Xiang Pan, Huiying Shi, Han Zhu, Xiaozhong Xu, Lei Sun, Zhenzhong Chen, Shan Liu, ZiCheng Zhang, HaoNing Wu, Yingjie Zhou, Chunyi Li, Xiaohong Liu, Weisi Lin, Guangtao Zhai, Wei Sun, Yuqin Cao, Yanwei Jiang, Jun Jia, Zhichao Zhang, Zijian Chen, Weixia Zhang, Xiongkuo Min, Steve Göring, Zihao Qi, Chen Feng
The performance of the top-5 submissions is reviewed and provided here as a survey of diverse deep models for efficient video quality assessment of user-generated content.
2 code implementations • CVPR 2024 • Chuwei Luo, Yufan Shen, Zhaoqing Zhu, Qi Zheng, Zhi Yu, Cong Yao
The core of LayoutLLM is a layout instruction tuning strategy, which is specially designed to enhance the comprehension and utilization of document layouts.
no code implementations • 3 Jan 2024 • Rujiao Long, Hangdi Xing, Zhibo Yang, Qi Zheng, Zhi Yu, Cong Yao, Fei Huang
We model TSR as a logical location regression problem and propose a new TSR framework called LORE, standing for LOgical location REgression network, which for the first time regresses logical location as well as spatial location of table cells in a unified network.
1 code implementation • ICCV 2023 • Cheng Da, Chuwei Luo, Qi Zheng, Cong Yao
Document pre-trained models and grid-based models have proven to be very effective on various tasks in Document AI.
Ranked #1 on
Document Layout Analysis
on PubLayNet val
1 code implementation • ICCV 2023 • Changxu Cheng, Peng Wang, Cheng Da, Qi Zheng, Cong Yao
The diversity in length constitutes a significant characteristic of text.
1 code implementation • CVPR 2023 • Chuwei Luo, Changxu Cheng, Qi Zheng, Cong Yao
Additionally, novel relation heads, which are pre-trained by the geometric pre-training tasks and fine-tuned for RE, are elaborately designed to enrich and enhance the feature representation.
Ranked #1 on
Entity Linking
on FUNSD
2 code implementations • 7 Mar 2023 • Hangdi Xing, Feiyu Gao, Rujiao Long, Jiajun Bu, Qi Zheng, Liangcheng Li, Cong Yao, Zhi Yu
Table structure recognition (TSR) aims at extracting tables in images into machine-understandable formats.
1 code implementation • 2 Mar 2023 • Qi Zheng, Daqing Liu, Chaoyue Wang, Jing Zhang, Dadong Wang, DaCheng Tao
In this work, we introduce a mechanism of Episodic Scene memory (ESceme) for VLN that wakes an agent's memories of past visits when it enters the current scene.
1 code implementation • CVPR 2023 • Heng Zhang, Daqing Liu, Qi Zheng, Bing Su
Specifically, we enforce the embeddings of the frame sequence of interest to approximate a goal-oriented stochastic process, i. e., Brownian bridge, in the latent space via a process-based contrastive loss.
1 code implementation • 21 Nov 2022 • Qi Zheng, Chaoyue Wang, Daqing Liu, Dadong Wang, DaCheng Tao
For each positive pair, we regard the images from different graphs as negative samples and deduct the version of multi-positive contrastive learning.
no code implementations • 27 Jun 2022 • Chuwei Luo, Guozhi Tang, Qi Zheng, Cong Yao, Lianwen Jin, Chenliang Li, Yang Xue, Luo Si
Multi-modal document pre-trained models have proven to be very effective in a variety of visually-rich document understanding (VrDU) tasks.
no code implementations • 21 Jun 2022 • Qi Zheng, Chaoyue Wang, Dadong Wang
Most existing methods model the coherence through the topic transition that dynamically infers a topic vector from preceding sentences.
no code implementations • 28 May 2022 • Qi Zheng, Chaoyue Wang, Dadong Wang, DaCheng Tao
Concept learning constructs visual representations that are connected to linguistic semantics, which is fundamental to vision-language tasks.
1 code implementation • 5 Jan 2022 • Qi Zheng, Zhengzhong Tu, Pavan C. Madhusudana, Xiaoyang Zeng, Alan C. Bovik, Yibo Fan
Video quality assessment (VQA) remains an important and challenging problem that affects many applications at the widest scales.
no code implementations • 24 Nov 2021 • Changxu Cheng, Bohan Li, Qi Zheng, Yongpan Wang, Wenyu Liu
As a result, the learning of semantic features is prone to have a bias on the limited vocabulary of the training set, which is called vocabulary reliance.
no code implementations • 17 Sep 2021 • Chengxi Li, Feiyu Gao, Jiajun Bu, Lu Xu, Xiang Chen, Yu Gu, Zirui Shao, Qi Zheng, Ningyu Zhang, Yongpan Wang, Zhi Yu
We inject sentiment knowledge regarding aspects, opinions, and polarities into prompt and explicitly model term relations via constructing consistency and polarity judgment templates from the ground truth triplets.
Aspect-Based Sentiment Analysis
Aspect-Based Sentiment Analysis (ABSA)
+5
1 code implementation • 22 Jul 2021 • Zhe Fei, Qi Zheng, Hyokyoung G. Hong, Yi Li
To our knowledge, there is little work available to draw inference on the effects of high dimensional predictors for censored quantile regression.
no code implementations • 8 May 2021 • Qi Zheng
One of the most commonly performed manipulation in a human's daily life is pouring.
no code implementations • 30 Apr 2021 • Qi Zheng
Automated cooking machine is a goal for the future.
no code implementations • 2 Feb 2021 • Qi Zheng, Jianfeng Dong, Xiaoye Qu, Xun Yang, Yabing Wang, Pan Zhou, Baolong Liu, Xun Wang
The language-based setting of this task allows for an open set of target activities, resulting in a large variation of the temporal lengths of video moments.
1 code implementation • CVPR 2020 • Qi Zheng, Chaoyue Wang, Dacheng Tao
Existing methods on video captioning have made great efforts to identify objects/instances in videos, but few of them emphasize the prediction of action.
2 code implementations • ECCV 2018 • Chaojian Yu, Xinyi Zhao, Qi Zheng, Peng Zhang, Xinge You
Fine-grained visual recognition is challenging because it highly relies on the modeling of various semantic parts and fine-grained feature learning.
no code implementations • 21 May 2018 • Qi Zheng, Shujian Yu, Xinge You, Qinmu Peng
Low-Rank Matrix Recovery (LRMR) has recently been applied to saliency detection by decomposing image features into a low-rank component associated with background and a sparse component associated with visual salient regions.