Both convolutional and recurrent operations are building blocks that process one local neighborhood at a time.
#3 best model for Instance Segmentation on COCO
Our hypothesis is that the appearance of a person -- their pose, clothing, action -- is a powerful cue for localizing the objects they are interacting with.
#2 best model for Human-Object Interaction Detection on HICO-DET
Our novel Focal Loss focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training.
#24 best model for Object Detection on COCO
In this work, we establish dense correspondences between RGB image and a surface-based representation of the human body, a task we refer to as dense human pose estimation.
#2 best model for Pose Estimation on DensePose-COCO
Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance.
#2 best model for Multi-Human Parsing on MHP v1.0
We investigate omni-supervised learning, a special regime of semi-supervised learning in which the learner exploits all available labeled data plus internet-scale sources of unlabeled data.
In contrast to previous region-based detectors such as Fast/Faster R-CNN that apply a costly per-region subnetwork hundreds of times, our region-based detector is fully convolutional with almost all computation shared on the entire image.
#5 best model for Real-Time Object Detection on PASCAL VOC 2007