On the other hand, feature fusion modules are designed to combine different modal of semantic features, which learns to leverage the information from both inputs for better results.
We propose a new 3D holistic++ scene understanding problem, which jointly tackles two tasks from a single-view image: (i) holistic scene parsing and reconstruction---3D estimations of object bounding boxes, camera pose, and room layout, and (ii) 3D human pose estimation.
The multi-scale context module refers to the operations to aggregate feature responses from a large spatial extent, while the single-stage encoder-decoder structure encodes the high-level semantic information in the encoder path and recovers the boundary information in the decoder path.
Semantic segmentation with Convolutional Neural Networks is a memory-intensive task due to the high spatial resolution of feature maps and output predictions.
The core components of our architecture are the Long-skip Refinement Module (LRM) and the Multi-scale Contexts Integration Module (MCIM).
In this paper, a practical and efficient edge-aware neural network is presented for semantic segmentation.
In our method, the exploratory robot scanning is both driven by and targeting at the recognition and segmentation of semantic objects from the scene.
We further propose to distill the structured knowledge from cumbersome networks into compact networks, which is motivated by the fact that semantic segmentation is a structured prediction problem.
In this paper, we investigate a novel deep-model reusing task.