Video Inpainting
42 papers with code • 4 benchmarks • 12 datasets
The goal of Video Inpainting is to fill in missing regions of a given video sequence with contents that are both spatially and temporally coherent. Video Inpainting, also known as video completion, has many real-world applications such as undesired object removal and video restoration.
Datasets
Latest papers
VQ-NeRV: A Vector Quantized Neural Representation for Videos
This block incorporates a codebook mechanism to discretize the network's shallow residual features and inter-frame residual information effectively.
Expression-aware video inpainting for HMD removal in XR applications
Our results demonstrate the remarkable capability of the proposed framework to remove HMDs from facial videos while maintaining the subject's facial expression and identity.
AVID: Any-Length Video Inpainting with Diffusion Model
Given a video, a masked region at its initial frame, and an editing prompt, it requires a model to do infilling at each frame following the editing guidance while keeping the out-of-mask region intact.
BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models
Now text-to-image foundation models are widely applied to various downstream image synthesis tasks, such as controllable image generation and image editing, while downstream video synthesis tasks are less explored for several reasons.
Flow-Guided Diffusion for Video Inpainting
Video inpainting has been challenged by complex scenarios like large movements and low-light conditions.
Deep Video Inpainting Guided by Audio-Visual Self-Supervision
To implement the prior knowledge, we first train the audio-visual network, which learns the correspondence between auditory and visual information.
Bitstream-Corrupted Video Recovery: A Novel Benchmark Dataset and Method
The past decade has witnessed great strides in video recovery by specialist technologies, like video inpainting, completion, and error concealment.
ProPainter: Improving Propagation and Transformer for Video Inpainting
We also propose a mask-guided sparse video Transformer, which achieves high efficiency by discarding unnecessary and redundant tokens.
UMMAFormer: A Universal Multimodal-adaptive Transformer Framework for Temporal Forgery Localization
Our approach introduces a Temporal Feature Abnormal Attention (TFAA) module based on temporal feature reconstruction to enhance the detection of temporal differences.
Deficiency-Aware Masked Transformer for Video Inpainting
Firstly, we pretrain a image inpainting model DMT_img serve as a prior for distilling the video model DMT_vid, thereby benefiting the hallucination of deficiency cases.