1 code implementation • 19 Apr 2024 • Fengyi Fu, Shancheng Fang, Weidong Chen, Zhendong Mao
Furthermore, a batch attention module is also proposed in this paper to alleviate the problem of missing sentimental samples, caused by the data imbalance, which is common in live videos as the popularity of videos varies.
1 code implementation • 11 Mar 2024 • Tianhao Qi, Shancheng Fang, Yanze Wu, Hongtao Xie, Jiawei Liu, Lang Chen, Qian He, Yongdong Zhang
The Q-Formers are trained using paired images rather than the identical target, in which the reference image and the ground-truth image are with the same style or semantics.
no code implementations • 1 Jul 2023 • Zhuowei Chen, Shancheng Fang, Wei Liu, Qian He, Mengqi Huang, Yongdong Zhang, Zhendong Mao
While large-scale pre-trained text-to-image models can synthesize diverse and high-quality human-centric images, an intractable problem is how to preserve the face identity for conditioned face images.
no code implementations • 5 Feb 2023 • Shiqi Sun, Shancheng Fang, Qian He, Wei Liu
Specifically, our method co-encodes images and text into a new domain during the training phase.
no code implementations • CVPR 2023 • Yuchen Ren, Zhendong Mao, Shancheng Fang, Yan Lu, Tong He, Hao Du, Yongdong Zhang, Wanli Ouyang
In this paper, we introduce a new setting called Domain Generalization for Image Captioning (DGIC), where the data from the target domain is unseen in the learning process.
1 code implementation • 19 Nov 2022 • Shancheng Fang, Zhendong Mao, Hongtao Xie, Yuxin Wang, Chenggang Yan, Yongdong Zhang
In this paper, we argue that the limited capacity of language models comes from 1) implicit language modeling; 2) unidirectional feature representation; and 3) language model with noise input.
Ranked #4 on Text Spotting on SCUT-CTW1500
2 code implementations • 22 Nov 2021 • Tianlun Zheng, Zhineng Chen, Shancheng Fang, Hongtao Xie, Yu-Gang Jiang
In this paper, we propose a novel module called Multi-Domain Character Distance Perception (MDCDP) to establish a visually and semantically related position embedding.
Ranked #11 on Scene Text Recognition on ICDAR2015
4 code implementations • ICCV 2021 • Yuxin Wang, Hongtao Xie, Shancheng Fang, Jing Wang, Shenggao Zhu, Yongdong Zhang
Such operation guides the vision model to use not only the visual texture of characters, but also the linguistic information in visual context for recognition when the visual cues are confused (e. g. occlusion, noise, etc.).
1 code implementation • 24 Jun 2021 • Yuxin Wang, Hongtao Xie, Shancheng Fang, Yadong Qu, Yongdong Zhang
However, there exists two problems: 1) the implicit erasure guidance causes the excessive erasure to non-text areas; 2) the one-stage erasure lacks the exhaustive removal of text region.
3 code implementations • CVPR 2021 • Shancheng Fang, Hongtao Xie, Yuxin Wang, Zhendong Mao, Yongdong Zhang
Additionally, based on the ensemble of iterative predictions, we propose a self-training method which can learn from unlabeled images effectively.