Video Inpainting
43 papers with code • 4 benchmarks • 12 datasets
The goal of Video Inpainting is to fill in missing regions of a given video sequence with contents that are both spatially and temporally coherent. Video Inpainting, also known as video completion, has many real-world applications such as undesired object removal and video restoration.
Datasets
Most implemented papers
The DEVIL is in the Details: A Diagnostic Evaluation Benchmark for Video Inpainting
Quantitative evaluation has increased dramatically among recent video inpainting work, but the video and mask content used to gauge performance has received relatively little attention.
Internal Video Inpainting by Implicit Long-range Propagation
We propose a novel framework for video inpainting by adopting an internal learning strategy.
FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting
On the contrary, the soft composition operates by stitching different patches into a whole feature map where pixels in overlapping regions are summed up.
Vision Transformer Based Video Hashing Retrieval for Tracing the Source of Fake Videos
In addition, we designed a tool called Localizator to compare the difference between the original traced video and the fake video.
Inertia-Guided Flow Completion and Style Fusion for Video Inpainting
We propose a flow completion network to align and aggregate flow features from the consecutive flow sequences based on the inertia prior.
Towards Unified Keyframe Propagation Models
We evaluate our two-stream approach for inpainting tasks, where experiments show that it improves both the propagation of features within a single frame as required for image inpainting, as well as their propagation from keyframes to target frames.
Fine-Grained Egocentric Hand-Object Segmentation: Dataset, Model, and Applications
Egocentric videos offer fine-grained information for high-fidelity modeling of human behaviors.
Flow-Guided Transformer for Video Inpainting
Especially in spatial transformer, we design a dual perspective spatial MHSA, which integrates the global tokens to the window-based attention.
Scalable Neural Video Representations with Learnable Positional Features
Succinct representation of complex signals using coordinate-based neural representations (CNRs) has seen great progress, and several recent efforts focus on extending them for handling videos.
INR-V: A Continuous Representation Space for Video-based Generative Tasks
In this work, we evaluate the space learned by INR-V on diverse generative tasks such as video interpolation, novel video generation, video inversion, and video inpainting against the existing baselines.