The newly proposed HC-STVG task aims to localize the target person spatio-temporally in an untrimmed video. For this task, we collect a new benchmark dataset, which has spatio temporal annotations related to the target persons in complex multi-person scenes, together with full interaction and rich action information.
Paper | Code | Results | Date | Stars |
---|