Human parsing is the task of segmenting a human image into different fine-grained semantic parts such as head, torso, arms and legs.
( Image credit: Multi-Human-Parsing (MHP) )
Despite the noticeable progress in perceptual tasks like detection, instance segmentation and human parsing, computers still perform unsatisfactorily on visually understanding humans in crowded scenes, such as group behavior analysis, person re-identification and autonomous driving, etc.
SOTA for Multi-Human Parsing on MHP v2.0
In this paper, we present a novel method to generate synthetic human part segmentation data using easily-obtained human keypoint annotations.
#3 best model for Human Part Segmentation on PASCAL-Person-Part (using extra training data)
Instance-level human parsing towards real-world human analysis scenarios is still under-explored due to the absence of sufficient data resources and technical difficulty in parsing multiple instances in a single pass.
#2 best model for Human Part Segmentation on CIHP
To tackle the problem of learning with label noises, this work introduces a purification strategy, called Self-Correction for Human Parsing (SCHP), to progressively promote the reliability of the supervised labels as well as the learned models.
SOTA for Semantic Segmentation on LIP val
Models need to distinguish different human instances in the image panel and learn rich features to represent the details of each instance.
SOTA for Pose Estimation on DensePose-COCO
Beyond the existing single-person and multiple-person human parsing tasks in static images, this paper makes the first attempt to investigate a more realistic video instance-level human parsing that simultaneously segments out each person instance and parses each instance into more fine-grained parts (e. g., head, leg, dress).
To further explore and take advantage of the semantic correlation of these two tasks, we propose a novel joint human parsing and pose estimation network to explore efficient context modeling, which can simultaneously predict parsing and pose with extremely high quality.
#6 best model for Semantic Segmentation on LIP val