Search Results for author: Shancheng Fang

Found 9 papers, 6 papers with code

DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations

1 code implementation11 Mar 2024 Tianhao Qi, Shancheng Fang, Yanze Wu, Hongtao Xie, Jiawei Liu, Lang Chen, Qian He, Yongdong Zhang

The Q-Formers are trained using paired images rather than the identical target, in which the reference image and the ground-truth image are with the same style or semantics.

Disentanglement

DreamIdentity: Improved Editability for Efficient Face-identity Preserved Image Generation

no code implementations1 Jul 2023 Zhuowei Chen, Shancheng Fang, Wei Liu, Qian He, Mengqi Huang, Yongdong Zhang, Zhendong Mao

While large-scale pre-trained text-to-image models can synthesize diverse and high-quality human-centric images, an intractable problem is how to preserve the face identity for conditioned face images.

Image Generation

Crossing the Gap: Domain Generalization for Image Captioning

no code implementations CVPR 2023 Yuchen Ren, Zhendong Mao, Shancheng Fang, Yan Lu, Tong He, Hao Du, Yongdong Zhang, Wanli Ouyang

In this paper, we introduce a new setting called Domain Generalization for Image Captioning (DGIC), where the data from the target domain is unseen in the learning process.

Domain Generalization Image Captioning +1

ABINet++: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Spotting

1 code implementation19 Nov 2022 Shancheng Fang, Zhendong Mao, Hongtao Xie, Yuxin Wang, Chenggang Yan, Yongdong Zhang

In this paper, we argue that the limited capacity of language models comes from 1) implicit language modeling; 2) unidirectional feature representation; and 3) language model with noise input.

Blocking Language Modelling +2

CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition

2 code implementations22 Nov 2021 Tianlun Zheng, Zhineng Chen, Shancheng Fang, Hongtao Xie, Yu-Gang Jiang

In this paper, we propose a novel module called Multi-Domain Character Distance Perception (MDCDP) to establish a visually and semantically related position embedding.

Position Scene Text Recognition

From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network

4 code implementations ICCV 2021 Yuxin Wang, Hongtao Xie, Shancheng Fang, Jing Wang, Shenggao Zhu, Yongdong Zhang

Such operation guides the vision model to use not only the visual texture of characters, but also the linguistic information in visual context for recognition when the visual cues are confused (e. g. occlusion, noise, etc.).

Language Modelling Scene Text Recognition

PERT: A Progressively Region-based Network for Scene Text Removal

1 code implementation24 Jun 2021 Yuxin Wang, Hongtao Xie, Shancheng Fang, Yadong Qu, Yongdong Zhang

However, there exists two problems: 1) the implicit erasure guidance causes the excessive erasure to non-text areas; 2) the one-stage erasure lacks the exhaustive removal of text region.

Cannot find the paper you are looking for? You can Submit a new open access paper.