Floors are Flat: Leveraging Semantics for Real-Time Surface Normal Prediction

16 Jun 2019  ·  Steven Hickson, Karthik Raveendran, Alireza Fathi, Kevin Murphy, Irfan Essa ·

We propose 4 insights that help to significantly improve the performance of deep learning models that predict surface normals and semantic labels from a single RGB image. These insights are: (1) denoise the "ground truth" surface normals in the training set to ensure consistency with the semantic labels; (2) concurrently train on a mix of real and synthetic data, instead of pretraining on synthetic and finetuning on real; (3) jointly predict normals and semantics using a shared model, but only backpropagate errors on pixels that have valid training labels; (4) slim down the model and use grayscale instead of color inputs. Despite the simplicity of these steps, we demonstrate consistently improved results on several datasets, using a model that runs at 12 fps on a standard mobile phone.

PDF Abstract

Datasets


Results from the Paper


 Ranked #1 on Semantic Segmentation on ScanNetV2 (Pixel Accuracy metric)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Surface Normals Estimation NYU Depth v2 Floors are Flat % < 11.25 59.5 # 4
% < 22.5 72.2 # 4
% < 30 77.3 # 4
Mean Angle Error 19.7 # 4
RMSE 19.3 # 1
Semantic Segmentation ScanNetV2 Floors are Flat Pixel Accuracy 65.6 # 1
Surface Normals Estimation ScanNetV2 Floors are Flat % < 11.25 50.9 # 2
% < 22.5 65.2 # 2
% < 30 70 # 2
Mean Angle Error 28 # 2

Methods


No methods listed for this paper. Add relevant methods here