However, there remains a lack of studies that extend action composition and leverage multiple viewpoints and multiple modalities of data for representation learning.
Ranked #1 on Video Classification on Home Action Genome
Compared to methods with similar detectors, it boosts almost 10 points of MOTA and significantly decreases the number of ID switches on BDD100K and Waymo datasets.
Ranked #1 on One-Shot Object Detection on PASCAL VOC 2012 val
Datasets drive vision progress, yet existing driving datasets are impoverished in terms of visual content and supported tasks to study multitask learning for autonomous driving.