BURST

Introduced by Athar et al. in BURST: A Benchmark for Unifying Object Recognition, Segmentation and Tracking in Video

BURST is a benchmark suite built upon TAO that requires tracking and segmenting multiple objects from camera video. The benchmark contains 6 different sub-tasks divided into 2 groups that all share the same data for training/validation/testing.

Class-guided

Common: Track and segment all objects belonging to a set of 78 common classes (based on the COCO class set)
Long-tail: Track and segment all objects belonging to an extended set of 482 object classes (based on the LVIS class set)
Open-world: Methods are only allowed to use the annotations of the 78 common classes during training, but during inference they are expected to track and segment all 482 object classes (class label predictions are not required)

Exemplar-guided

Mask: Track and segment all objects in the video for which the first-frame object masks are given. This task is identical to Video Object Segmentation (VOS).
Box: Track and segment all objects in the video for which the first-frame object bounding-boxes are given.
Point: Track and segment all objects in the video for which we are only given the (x,y) point coordinates of the mask centroid in the first-frame in which the objects appear.

An illustration of the task hierarchy is given here and a detailed explanation is given in Sec. 5 of the dataset paper

Homepage

Benchmarks

Add a new result Link an existing benchmark

Task	Dataset Variant	Best Model
Long-tail Video Object Segmentation	BURST-val	GLEE-Pro
Open-World Video Segmentation	BURST-val	DEVA
Semi-Supervised Video Object Segmentation	BURST-val	Cutie
Semi-Supervised Video Object Segmentation	BURST-test	Cutie
Long-tail Video Object Segmentation	BURST	GLEE-Lite