1 code implementation • 5 May 2023 • Weijia Wu, Yuzhong Zhao, Zhuang Li, Jiahong Li, Hong Zhou, Mike Zheng Shou, Xiang Bai
Most existing cross-modal language-to-video retrieval (VR) research focuses on single-modal input from video, i. e., visual representation, while the text is omnipresent in human environments and frequently critical to understand video.
1 code implementation • 5 May 2023 • Yuzhong Zhao, Weijia Wu, Zhuang Li, Jiahong Li, Weiqiang Wang
This paper introduces a novel video text synthesis technique called FlowText, which utilizes optical flow estimation to synthesize a large amount of text video data at a low cost for training robust video text spotters.
no code implementations • 10 Apr 2023 • Weijia Wu, Yuzhong Zhao, Zhuang Li, Jiahong Li, Mike Zheng Shou, Umapada Pal, Dimosthenis Karatzas, Xiang Bai
In this competition report, we establish a video text reading benchmark, DSText, which focuses on dense and small text reading challenges in the video with various scenarios.
no code implementations • 8 Apr 2023 • Kai Song, Shaofeng Wang, Ziwei Xie, Shanyu Wang, Jiahong Li, Yongqiang Yang
In the offline stage, to construct the graph, user IDs and specific side information combinations of the shows are chosen to be the nodes, and click/co-click relations and view time are used to build the edges.
1 code implementation • 18 Jul 2022 • Wejia Wu, Zhuang Li, Jiahong Li, Chunhua Shen, Hong Zhou, Size Li, Zhongyuan Wang, Ping Luo
Our contributions are three-fold: 1) CoText simultaneously address the three tasks (e. g., text detection, tracking, recognition) in a real-time end-to-end trainable framework.
1 code implementation • CVPR 2022 • Zhuo Wang, Zezheng Wang, Zitong Yu, Weihong Deng, Jiahong Li, Tingting Gao, Zhongyuan Wang
A novel Shuffled Style Assembly Network (SSAN) is proposed to extract and reassemble different content and style features for a stylized feature space.
1 code implementation • 30 Dec 2021 • Zhuang Li, Weijia Wu, Mike Zheng Shou, Jiahong Li, Size Li, Zhongyuan Wang, Hong Zhou
Semantic representation is of great benefit to the video text tracking(VTT) task that requires simultaneously classifying, detecting, and tracking texts in the video.
3 code implementations • 9 Dec 2021 • Weijia Wu, Yuanqiang Cai, Debing Zhang, Sibo Wang, Zhuang Li, Jiahong Li, Yejun Tang, Hong Zhou
Most existing video text spotting benchmarks focus on evaluating a single language and scenario with limited data.
1 code implementation • 24 Nov 2021 • Zezheng Wang, Zitong Yu, Xun Wang, Yunxiao Qin, Jiahong Li, Chenxu Zhao, Zhen Lei, Xin Liu, Size Li, Zhongyuan Wang
Face anti-spoofing (FAS) plays a crucial role in securing face recognition systems.
no code implementations • CVPR 2021 • Jiaming Li, Hongtao Xie, Jiahong Li, Zhongyuan Wang, Yongdong Zhang
Face forgery detection is raising ever-increasing interest in computer vision since facial manipulation technologies cause serious worries.
no code implementations • 2 Nov 2017 • Xinyue Zhu, Yifan Liu, Zengchang Qin, Jiahong Li
In this paper, we propose a data augmentation method using generative adversarial networks (GAN).