Spatio-temporal Co-attention Fusion Network for Video Splicing Localization

18 Sep 2023  ·  Man Lin, Gang Cao, Zijie Lou ·

Digital video splicing has become easy and ubiquitous. Malicious users copy some regions of a video and paste them to another video for creating realistic forgeries. It is significant to blindly detect such forgery regions in videos. In this paper, a spatio-temporal co-attention fusion network (SCFNet) is proposed for video splicing localization. Specifically, a three-stream network is used as an encoder to capture manipulation traces across multiple frames. The deep interaction and fusion of spatio-temporal forensic features are achieved by the novel parallel and cross co-attention fusion modules. A lightweight multilayer perceptron (MLP) decoder is adopted to yield a pixel-level tampering localization map. A new large-scale video splicing dataset is created for training the SCFNet. Extensive tests on benchmark datasets show that the localization and generalization performances of our SCFNet outperform the state-of-the-art. Code and datasets will be available at https://github.com/multimediaFor/SCFNet.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here