This paper proposes a new method called Multimodal RNNs for RGB-D scene semantic segmentation.
Then the Siamese CNN and temporally constrained metrics are jointly learned online to construct the appearance-based tracklet affinity models.
In this manuscript, we integrate CNNs with HRNNs, and develop end-to-end convolutional hierarchical recurrent neural networks (C-HRNNs).
In image labeling, local representations for image units are usually generated from their surrounding image patches, thus long-range contextual information is not effectively encoded.
Ranked #20 on Semantic Segmentation on COCO-Stuff test
In order to encode the class correlation and class specific information in image representation, we propose a new local feature learning approach named Deep Discriminative and Shareable Feature Learning (DDSFL).
We adopt Convolutional Neural Networks (CNN) as our parametric model to learn discriminative features and classifiers for local patch classification.