no code implementations • 26 Aug 2024 • Bingcheng Dong, Yuning Ding, Jinrong Zhang, Sifan Zhang, Shenglan Liu
In response, we introduce a strong DETR-based model, Visual Intersection Network for Open Set Object Detection (VINO), which constructs a multi-image visual bank to preserve the semantic intersections of each category across all time steps.
1 code implementation • 27 Sep 2023 • Jinrong Zhang, Wujun Wen, Shenglan Liu, Yunheng Li, QiFeng Li, Lin Feng
The streaming temporal action segmentation (STAS) task, a supplementary task of temporal action segmentation (TAS), has not received adequate attention in the field of video understanding.