12 dataset results for Video Inpainting

DAVIS (Densely Annotated VIdeo Segmentation)

The Densely Annotation Video Segmentation dataset (DAVIS) is a high quality and high resolution densely annotated video segmentation dataset under two resolutions, 480p and 1080p. There are 50 video sequences with 3455 densely annotated frames in pixel level. 30 videos with 2079 frames are for training and 20 videos with 1376 frames are for validation.

634 PAPERS • 13 BENCHMARKS

YouTube-VOS 2018 (Youtube Video Object Segmentation)

Youtube-VOS is a Video Object Segmentation dataset that contains 4,453 videos - 3,471 for training, 474 for validation, and 508 for testing. The training and validation videos have pixel-level ground truth annotations for every 5th frame (6 fps). It also contains Instance Segmentation annotations. It has more than 7,800 unique objects, 190k high-quality manual annotations and more than 340 minutes in duration.

174 PAPERS • 10 BENCHMARKS

How2Sign (A Large-scale Multimodal Dataset for Continuous American Sign Language)

The How2Sign is a multimodal and multiview continuous American Sign Language (ASL) dataset consisting of a parallel corpus of more than 80 hours of sign language videos and a set of corresponding modalities including speech, English transcripts, and depth. A three-hour subset was further recorded in the Panoptic studio enabling detailed 3D pose estimation.

28 PAPERS • 3 BENCHMARKS

FVI (Free-form Video Inpainting)

The Free-Form Video Inpainting dataset is a dataset used for training and evaluation video inpainting models. It consists of 1940 videos from the YouTube-VOS dataset and 12,600 videos from the YouTube-BoundingBoxes.

8 PAPERS • NO BENCHMARKS YET

KITTI360-EX

KITTI360-EX is a dataset for outer- and inner FoV expansion. It contains 76k pinhole images as well as 76k spherical images and is used for beyond-FoV estimation.

7 PAPERS • 1 BENCHMARK

DEVIL

DEVIL (Diagnostic Evaluation of Video Inpainting on Landscapes)

Diagnostic Evaluation of Video Inpainting on Landscapes (DEVIL) benchmark is composed of a curated video/occlusion mask dataset and a comprehensive evaluation scheme

3 PAPERS • NO BENCHMARKS YET

EgoHOS (Fine-Grained Egocentric Hand-Object Segmentation Dataset)

EgoHOS is a labeled dataset consisting of 11243 egocentric images with per-pixel segmentation labels of hands and objects being interacted with during a diverse array of daily activities. The data are collected form multiple sources: 7,458 frames from Ego4D, 2,212 frames from EPIC-KITCHEN, 806 frames from THU-READ, and 350 frames of our own collected egocentric videos with people playing Escape Room. This dataset is designed for tasks including hand state classification, video activity recognition, 3D mesh reconstruction of hand-object interactions, and video inpainting of hand-object foregrounds in egocentric videos.

2 PAPERS • NO BENCHMARKS YET

QST

QST (Quick Sky Time)

QST contains 1,167 video clips that are cut out from 216 time-lapse 4K videos collected from YouTube, which can be used for a variety of tasks, such as (high-resolution) video generation, (high-resolution) video prediction, (high-resolution) image generation, texture generation, image inpainting, image/video super-resolution, image/video colorization, image/video animating, etc. Each short clip contains multiple frames (from a minimum of 58 frames to a maximum of 1,200 frames, a total of 285,446 frames), and the resolution of each frame is more than 1,024 x 1,024. Specifically, QST consists of a training set (containing 1000 clips, totally 244,930 frames), a validation set (containing 100 clips, totally 23,200 frames), and a testing set (containing 67 clips, totally 17,316 frames). Click here (Key: qst1) to download the QST dataset.

2 PAPERS • NO BENCHMARKS YET

Apolloscape Inpainting

The Inpainting dataset consists of synchronized Labeled image and LiDAR scanned point clouds. It's captured by HESAI Pandora All-in-One Sensing Kit. It is collected under various lighting conditions and traffic densities in Beijing, China.

1 PAPER • 1 BENCHMARK

VideoRemoval4K

We provide video sequences with annotated object masks for video inpainting. The resolution is 3840 x 2160.

1 PAPER • NO BENCHMARKS YET

DREAMING Inpainting Dataset (Diminished Reality for Emerging Applications in Medicine through Inpainting Dataset)

Dataset for the DREAMING - Diminished Reality for Emerging Applications in Medicine through Inpainting Challenge!

0 PAPER • NO BENCHMARKS YET

WRV

WRV (Wire-removal Dataset)

G2LP Wire-removal Dataset in G2LP-Net: Global to Local Progressive Video Inpainting Network

0 PAPER • NO BENCHMARKS YET

Datasets

12 dataset results for Video Inpainting