Search Results for author: Shancheng Fang

Found 10 papers, 7 papers with code

Sentiment-oriented Transformer-based Variational Autoencoder Network for Live Video Commenting

1 code implementation • 19 Apr 2024 • Fengyi Fu, Shancheng Fang, Weidong Chen, Zhendong Mao

Furthermore, a batch attention module is also proposed in this paper to alleviate the problem of missing sentimental samples, caused by the data imbalance, which is common in live videos as the popularity of videos varies.

Paper
Code

DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations

1 code implementation • 11 Mar 2024 • Tianhao Qi, Shancheng Fang, Yanze Wu, Hongtao Xie, Jiawei Liu, Lang Chen, Qian He, Yongdong Zhang

The Q-Formers are trained using paired images rather than the identical target, in which the reference image and the ground-truth image are with the same style or semantics.

Disentanglement

151

Paper
Code

DreamIdentity: Improved Editability for Efficient Face-identity Preserved Image Generation

no code implementations • 1 Jul 2023 • Zhuowei Chen, Shancheng Fang, Wei Liu, Qian He, Mengqi Huang, Yongdong Zhang, Zhendong Mao

While large-scale pre-trained text-to-image models can synthesize diverse and high-quality human-centric images, an intractable problem is how to preserve the face identity for conditioned face images.

Image Generation

Paper
Add Code

Design Booster: A Text-Guided Diffusion Model for Image Translation with Spatial Layout Preservation

no code implementations • 5 Feb 2023 • Shiqi Sun, Shancheng Fang, Qian He, Wei Liu

Specifically, our method co-encodes images and text into a new domain during the training phase.

Translation

Paper
Add Code

Crossing the Gap: Domain Generalization for Image Captioning

no code implementations • CVPR 2023 • Yuchen Ren, Zhendong Mao, Shancheng Fang, Yan Lu, Tong He, Hao Du, Yongdong Zhang, Wanli Ouyang

In this paper, we introduce a new setting called Domain Generalization for Image Captioning (DGIC), where the data from the target domain is unseen in the learning process.

Domain Generalization Image Captioning +1

Paper
Add Code

ABINet++: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Spotting

1 code implementation • 19 Nov 2022 • Shancheng Fang, Zhendong Mao, Hongtao Xie, Yuxin Wang, Chenggang Yan, Yongdong Zhang

In this paper, we argue that the limited capacity of language models comes from 1) implicit language modeling; 2) unidirectional feature representation; and 3) language model with noise input.

Ranked #4 on Text Spotting on SCUT-CTW1500

Blocking Language Modelling +2

Paper
Code

CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition

2 code implementations • 22 Nov 2021 • Tianlun Zheng, Zhineng Chen, Shancheng Fang, Hongtao Xie, Yu-Gang Jiang

In this paper, we propose a novel module called Multi-Domain Character Distance Perception (MDCDP) to establish a visually and semantically related position embedding.

Ranked #11 on Scene Text Recognition on ICDAR2015

Position Scene Text Recognition

106

Paper
Code

From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network

4 code implementations • ICCV 2021 • Yuxin Wang, Hongtao Xie, Shancheng Fang, Jing Wang, Shenggao Zhu, Yongdong Zhang

Such operation guides the vision model to use not only the visual texture of characters, but also the linguistic information in visual context for recognition when the visual cues are confused (e. g. occlusion, noise, etc.).

Language Modelling Scene Text Recognition

38,618

Paper
Code

PERT: A Progressively Region-based Network for Scene Text Removal

1 code implementation • 24 Jun 2021 • Yuxin Wang, Hongtao Xie, Shancheng Fang, Yadong Qu, Yongdong Zhang

However, there exists two problems: 1) the implicit erasure guidance causes the excessive erasure to non-text areas; 2) the one-stage erasure lacks the exhaustive removal of text region.

Paper
Code

Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition

3 code implementations • CVPR 2021 • Shancheng Fang, Hongtao Xie, Yuxin Wang, Zhendong Mao, Yongdong Zhang

Additionally, based on the ensemble of iterative predictions, we propose a self-training method which can learn from unlabeled images effectively.

Language Modelling Scene Text Recognition

38,618

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.