Our new benchmark dataset contains 20 textureless objects, 22 scenes, 404 video sequences and 126K images captured in real scenes.
For semantic segmentation of remote sensing images (RSI), trade-off between representation power and location accuracy is quite important.
The paper demonstrates that using approximate multipliers for CNN training can significantly enhance the performance in terms of speed, power, and area at the cost of a small negative impact on the achieved accuracy.
A comparison of the performance of various machine learning models to predict the direction of a wall following robot is presented in this paper.
More specifically, we present an encoder-decoder network with shared encoder and two separate decoders, which are composed of multiple deconvolution (transposed convolution) layers, to jointly learn the edge maps and semantic labels of a room image.