no code implementations • COLING (TextGraphs) 2020 • Chuwei Luo, Yongpan Wang, Qi Zheng, Liangchen Li, Feiyu Gao, Shiyu Zhang
By incorporating geometry information from visual documents into our model, richer 2D context information is generated to improve document representations.
no code implementations • ACL 2022 • Yupei Du, Qi Zheng, Yuanbin Wu, Man Lan, Yan Yang, Meirong Ma
To exemplify the potential applications of our study, we also present two strategies (by adding and removing KB triples) to mitigate gender biases in KB embeddings.
no code implementations • ECCV 2020 • Liangcheng Li, Feiyu Gao, Jiajun Bu, Yongpan Wang, Zhi Yu, Qi Zheng
Nowadays rich description on detail images help users know more about the commodities.
3 code implementations • 8 Apr 2024 • Chuwei Luo, Yufan Shen, Zhaoqing Zhu, Qi Zheng, Zhi Yu, Cong Yao
The core of LayoutLLM is a layout instruction tuning strategy, which is specially designed to enhance the comprehension and utilization of document layouts.
no code implementations • 3 Jan 2024 • Rujiao Long, Hangdi Xing, Zhibo Yang, Qi Zheng, Zhi Yu, Cong Yao, Fei Huang
We model TSR as a logical location regression problem and propose a new TSR framework called LORE, standing for LOgical location REgression network, which for the first time regresses logical location as well as spatial location of table cells in a unified network.
1 code implementation • ICCV 2023 • Cheng Da, Chuwei Luo, Qi Zheng, Cong Yao
Document pre-trained models and grid-based models have proven to be very effective on various tasks in Document AI.
Ranked #1 on Document Layout Analysis on PubLayNet val
1 code implementation • ICCV 2023 • Changxu Cheng, Peng Wang, Cheng Da, Qi Zheng, Cong Yao
The diversity in length constitutes a significant characteristic of text.
1 code implementation • CVPR 2023 • Chuwei Luo, Changxu Cheng, Qi Zheng, Cong Yao
Additionally, novel relation heads, which are pre-trained by the geometric pre-training tasks and fine-tuned for RE, are elaborately designed to enrich and enhance the feature representation.
Ranked #1 on Key Information Extraction on CORD
1 code implementation • 7 Mar 2023 • Hangdi Xing, Feiyu Gao, Rujiao Long, Jiajun Bu, Qi Zheng, Liangcheng Li, Cong Yao, Zhi Yu
Table structure recognition (TSR) aims at extracting tables in images into machine-understandable formats.
1 code implementation • 2 Mar 2023 • Qi Zheng, Daqing Liu, Chaoyue Wang, Jing Zhang, Dadong Wang, DaCheng Tao
Vision-and-language navigation (VLN) simulates a visual agent that follows natural-language navigation instructions in real-world scenes.
1 code implementation • CVPR 2023 • Heng Zhang, Daqing Liu, Qi Zheng, Bing Su
Specifically, we enforce the embeddings of the frame sequence of interest to approximate a goal-oriented stochastic process, i. e., Brownian bridge, in the latent space via a process-based contrastive loss.
1 code implementation • 21 Nov 2022 • Qi Zheng, Chaoyue Wang, Daqing Liu, Dadong Wang, DaCheng Tao
For each positive pair, we regard the images from different graphs as negative samples and deduct the version of multi-positive contrastive learning.
no code implementations • 27 Jun 2022 • Chuwei Luo, Guozhi Tang, Qi Zheng, Cong Yao, Lianwen Jin, Chenliang Li, Yang Xue, Luo Si
Multi-modal document pre-trained models have proven to be very effective in a variety of visually-rich document understanding (VrDU) tasks.
no code implementations • 21 Jun 2022 • Qi Zheng, Chaoyue Wang, Dadong Wang
Most existing methods model the coherence through the topic transition that dynamically infers a topic vector from preceding sentences.
no code implementations • 28 May 2022 • Qi Zheng, Chaoyue Wang, Dadong Wang, DaCheng Tao
Concept learning constructs visual representations that are connected to linguistic semantics, which is fundamental to vision-language tasks.
1 code implementation • 5 Jan 2022 • Qi Zheng, Zhengzhong Tu, Pavan C. Madhusudana, Xiaoyang Zeng, Alan C. Bovik, Yibo Fan
Video quality assessment (VQA) remains an important and challenging problem that affects many applications at the widest scales.
no code implementations • 24 Nov 2021 • Changxu Cheng, Bohan Li, Qi Zheng, Yongpan Wang, Wenyu Liu
As a result, the learning of semantic features is prone to have a bias on the limited vocabulary of the training set, which is called vocabulary reliance.
no code implementations • 17 Sep 2021 • Chengxi Li, Feiyu Gao, Jiajun Bu, Lu Xu, Xiang Chen, Yu Gu, Zirui Shao, Qi Zheng, Ningyu Zhang, Yongpan Wang, Zhi Yu
We inject sentiment knowledge regarding aspects, opinions, and polarities into prompt and explicitly model term relations via constructing consistency and polarity judgment templates from the ground truth triplets.
Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +3
1 code implementation • 22 Jul 2021 • Zhe Fei, Qi Zheng, Hyokyoung G. Hong, Yi Li
To our knowledge, there is little work available to draw inference on the effects of high dimensional predictors for censored quantile regression.
no code implementations • 8 May 2021 • Qi Zheng
One of the most commonly performed manipulation in a human's daily life is pouring.
no code implementations • 30 Apr 2021 • Qi Zheng
Automated cooking machine is a goal for the future.
no code implementations • 2 Feb 2021 • Qi Zheng, Jianfeng Dong, Xiaoye Qu, Xun Yang, Yabing Wang, Pan Zhou, Baolong Liu, Xun Wang
The language-based setting of this task allows for an open set of target activities, resulting in a large variation of the temporal lengths of video moments.
1 code implementation • CVPR 2020 • Qi Zheng, Chaoyue Wang, Dacheng Tao
Existing methods on video captioning have made great efforts to identify objects/instances in videos, but few of them emphasize the prediction of action.
2 code implementations • ECCV 2018 • Chaojian Yu, Xinyi Zhao, Qi Zheng, Peng Zhang, Xinge You
Fine-grained visual recognition is challenging because it highly relies on the modeling of various semantic parts and fine-grained feature learning.
no code implementations • 21 May 2018 • Qi Zheng, Shujian Yu, Xinge You, Qinmu Peng
Low-Rank Matrix Recovery (LRMR) has recently been applied to saliency detection by decomposing image features into a low-rank component associated with background and a sparse component associated with visual salient regions.