UVO is a new benchmark for open-world class-agnostic object segmentation in videos. Besides shifting the problem focus to the open-world setup, UVO is significantly larger, providing approximately 8 times more videos compared with DAVIS, and 7 times more mask (instance) annotations per video compared with YouTube-VOS and YouTube-VIS. UVO is also more challenging as it includes many videos with crowded scenes and complex background motions. Some highlights of the dataset include:
24 PAPERS • 3 BENCHMARKS
Understanding comprehensive assembly knowledge from videos is critical for futuristic ultra-intelligent industry. To enable technological breakthrough, we present HA-ViD – an assembly video dataset that features representative industrial assembly scenarios, natural procedural knowledge acquisition process, and consistent human-robot shared annotations. Specifically, HA-ViD captures diverse collaboration patterns of real-world assembly, natural human behaviors and learning progression during assembly, and granulate action annotations to subject, action verb, manipulated object, target object, and tool. We provide 3222 multi-view and multi-modality videos, 1.5M frames, 96K temporal labels and 2M spatial labels. We benchmark four foundational video understanding tasks: action recognition, action segmentation, object detection and multi-object tracking. Importantly, we analyze their performance and the further reasoning steps for comprehending knowledge in assembly progress, process effici
1 PAPER • NO BENCHMARKS YET
Infinity AI's Spills Basic Dataset is a synthetic, open-source dataset for safety applications. It features 150 videos of photorealistic liquid spills across 15 common settings. Spills take on in-context reflections, caustics, and depth based on the surrounding environment, lighting, and floor. Each video contains a spill of unique properties (size, color, profile, and more) and is accompanied by pixel-perfect labels and annotations. This dataset can be used to develop computer vision algorithms to detect the location and type of spill from the perspective of a fixed camera.
0 PAPER • NO BENCHMARKS YET