Specifically, we propose a new network structure called Compression Network with Transformer (CNT) to compress the feature into a low dimensional space, and an inhomogeneous neighborhood relationship preserving (INRP) loss that aims to maintain high search accuracy.
Specifically, we exploit pseudo-LiDAR using depth estimation, and propose a feature fusion network where RGB and learned depth information are fused for improved road detection.
Correspondingly, different models need to be designed for different datasets, which further increases the workload of designing architectures; 2) the mainstream framework is a patch-to-pixel framework.
For outer search space, we propose cell sharing strategy to save memory, and considerably accelerate the search speed.
In contrast to previous approaches, we do not impose restrictions over the source data sets, in which they do not have to be collected by the same sensors as the target data sets.
However, the existing CNN-based models operate at the patch-level, in which pixel is separately classified into classes using a patch of images around it.
Hyperspectral image(HSI) classification has been improved with convolutional neural network(CNN) in very recent years.
However, the existing methods usually do not have good generalization ability, which leads to the fact that almost all of existing methods have a satisfied performance on removing a specific type of rain streaks, but may have a relatively poor performance on other types of rain streaks.
This paper tackles the problem of video object segmentation.
The temporal consistency loss is combined with the spatial loss to update the model in an end-to-end fashion.
Ranked #4 on Monocular Depth Estimation on Mid-Air Dataset
Different from RGB videos, depth data in RGB-D videos provide key complementary information for tristimulus visual data which potentially could achieve accuracy improvement for action recognition.