Missing Modality Robustness in Semi-Supervised Multi-Modal Semantic Segmentation

21 Apr 2023  ยท  Harsh Maheshwari, Yen-Cheng Liu, Zsolt Kira ยท

Using multiple spatial modalities has been proven helpful in improving semantic segmentation performance. However, there are several real-world challenges that have yet to be addressed: (a) improving label efficiency and (b) enhancing robustness in realistic scenarios where modalities are missing at the test time. To address these challenges, we first propose a simple yet efficient multi-modal fusion mechanism Linear Fusion, that performs better than the state-of-the-art multi-modal models even with limited supervision. Second, we propose M3L: Multi-modal Teacher for Masked Modality Learning, a semi-supervised framework that not only improves the multi-modal performance but also makes the model robust to the realistic missing modality scenario using unlabeled data. We create the first benchmark for semi-supervised multi-modal semantic segmentation and also report the robustness to missing modalities. Our proposal shows an absolute improvement of up to 10% on robust mIoU above the most competitive baselines. Our code is available at https://github.com/harshm121/M3L

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Robust Semi-Supervised RGBD Semantic Segmentation 2D-3D-S M3L (Linear Fusion B2) MM-Robust mIoU (0.1% labels) 41.36 # 1
MM-Robust mIoU (0.2% labels) 45.46 # 1
MM-Robust mIoU (1% labels) 50.52 # 1
Semi-Supervised Semantic Segmentation 2D-3D-S M3L (Linear Fusion B2) mIoU (0.1% labels) 40.05 # 1
mIoU (0.2% labels) 44.62 # 1
mIoU (1% labels) 49.28 # 1
Robust Semi-Supervised RGBD Semantic Segmentation 2D-3D-S Mean Teacher (Linear Fusion B2) MM-Robust mIoU (0.1% labels) 32.35 # 2
MM-Robust mIoU (0.2% labels) 35.29 # 2
MM-Robust mIoU (1% labels) 36.8 # 2
Semi-Supervised RGBD Semantic Segmentation 2D-3D-S M3L (Linear Fusion B2) mIoU (0.1% labels) 44.1 # 1
mIoU (0.2% labels) 49.05 # 1
mIoU (1% labels) 55.48 # 1
Semi-Supervised Semantic Segmentation Stanford 2D-3D M3L (Linear Fusion - Segformer B2) mIoU (0.1% labels) 44.1 # 1
MM-Robust mIoU (0.1% labels) 41.36 # 1
RGBD Semantic Segmentation Stanford 2D-3D Linear Fusion (Segformer B2) mIOU 0.5716 # 1
Semi-Supervised Semantic Segmentation Stanford 2D-3D Mean Teacher (Linear Fusion - Segformer B2) mIoU (0.1% labels) 41.7 # 2
Semantic Segmentation Stanford2D3D - RGBD Linear Fusion (Segformer B2) mIoU 57.16 # 4
Robust Semi-Supervised RGBD Semantic Segmentation SUN RGB-D Mean Teacher (Linear Fusion B2) MM-Robust mIoU (6.25% labels) 26.18 # 2
MM-Robust mIoU (12.5% labels) 30.96 # 2
MM-Robust mIoU (25% labels) 33.98 # 2
Robust Semi-Supervised RGBD Semantic Segmentation SUN RGB-D M3L (Linear Fusion B2) MM-Robust mIoU (6.25% labels) 28.68 # 1
MM-Robust mIoU (12.5% labels) 36.7 # 1
MM-Robust mIoU (25% labels) 39.37 # 1
Semantic Segmentation SUN-RGBD LF (Segformer B2) Mean IoU (test) 48.17 # 1

Methods


M3L โ€ข Test