In this paper, we introduce two new modules to enhance the capability of Sparse CNNs, both are based on making feature sparsity learnable with position-wise importance prediction.
However, this option traditionally hurts the detection performance much.
In particular, Panoptic FCN encodes each object instance or stuff category with the proposed kernel generator and produces the prediction by convolving the high-resolution feature directly.
We propose Scale-aware AutoAug to learn data augmentation policies for object detection.
The Learnable Tree Filter presents a remarkable approach to model structure-preserving relations for semantic segmentation.
To this end, we propose a fine-grained dynamic head to conditionally select a pixel-level combination of FPN features from different scales for each instance, which further releases the ability of multi-scale feature representation.
In this paper, we present a conceptually simple, strong, and efficient framework for panoptic segmentation, called Panoptic FCN.
Ranked #1 on Panoptic Segmentation on Cityscapes val (PQst metric)
We propose a Dynamic Scale Training paradigm (abbreviated as DST) to mitigate scale variation challenge in object detection.
To demonstrate the superiority of the dynamic property, we compare with several static architectures, which can be modeled as special cases in the routing space.
To this end, tree filtering modules are embedded to formulate a unified framework for semantic segmentation.
Given the results of MTN, we adopt an occlusion-aware Re-ID feature strategy in the pose tracking module, where pose information is utilized to infer the occlusion state to make better use of Re-ID feature.
Multi-target Multi-camera Tracking (MTMCT) aims to extract the trajectories from videos captured by a set of cameras.
Facial expression recognition is a challenging task, arguably because of large intra-class variations and high inter-class similarities.
This paper studies panoptic segmentation, a recently proposed task which segments foreground (FG) objects at the instance level as well as background (BG) contents at the semantic level.
Ranked #16 on Panoptic Segmentation on Cityscapes val