TextSleuth: A New Dataset and Baseline for Scene Text Manipulation Detection
With the rise of digital content on social media and the advancement of image editing tools, tampering with scene text has become a serious concern. Scene text manipulation detection (STMD) is a kind of image manipulation detection (IMD) with focus on the tampering of scene text pixels, which is crucial for image content integrity and media forensics. In this paper, we present TextSleuth, a novel benchmark dataset specifically designed for STMD, by integrating three public datasets with newly introduced manipulation and annotations. We introduce professional edits on the Total-Text dataset (~1K images) with four levels of manipulated region perceptibility, and a large synthetic manipulation set (858K images) on the SynthText dataset, as well the integration of the Tampered-IC13 dataset (378 images). We established a new STMD baseline based on TextSleuth using MMFusion-IML, the state-of-the-art image manipulation detection model. We performed extensive experiments, reporting the AUC from ROC analysis and the balanced accuracy (bACC) metrics to maintain a balanced performance evaluation. The MMFusion-IML baseline achieves 0.641 AUC and 0.588 bACC on the Total-Text subset. In comparison, it achieves 0.89 AUC and 0.8272 bACC on the Tampered-IC13 subset. This showcases the real-world STMD challenges reflected in our new dataset. TextSleuth is a valuable resource for future research in scene text manipulation detection and forensics. The dataset is available at https://github.com/abhineet-pandey/Text-Sleuth.
PDF Abstract