Since the beginning of early civilizations, social relationships derived from each individual fundamentally form the basis of social structure in our daily life.
Spatial pooling has been proven highly effective in capturing long-range contextual information for pixel-wise prediction tasks, such as scene parsing.
Scene understanding of high resolution aerial images is of great importance for the task of automated monitoring in various remote sensing applications.
Replacing the background and simultaneously adjusting foreground objects is a challenging task in image editing.
We introduce Synscapes -- a synthetic dataset for street scene parsing created using photorealistic rendering techniques, and show state-of-the-art results for training and validation as well as new types of analysis.
On the one hand, the integrated classification model contains multiple classifiers, not only the general classifier but also a refinement classifier to distinguish the confusing categories.
We turn it into a realistic few-shot classification benchmark by splitting the object categories into head and tail based on their distribution in the world.
On the other hand, feature fusion modules are designed to combine different modal of semantic features, which leverage the information from both inputs for better accuracy.