AANet: Adaptive Aggregation Network for Efficient Stereo Matching

CVPR 2020  ·  Haofei Xu, Juyong Zhang ·

Despite the remarkable progress made by learning based stereo matching algorithms, one key challenge remains unsolved. Current state-of-the-art stereo models are mostly based on costly 3D convolutions, the cubic computational complexity and high memory consumption make it quite expensive to deploy in real-world applications. In this paper, we aim at completely replacing the commonly used 3D convolutions to achieve fast inference speed while maintaining comparable accuracy. To this end, we first propose a sparse points based intra-scale cost aggregation method to alleviate the well-known edge-fattening issue at disparity discontinuities. Further, we approximate traditional cross-scale cost aggregation algorithm with neural network layers to handle large textureless regions. Both modules are simple, lightweight, and complementary, leading to an effective and efficient architecture for cost aggregation. With these two modules, we can not only significantly speed up existing top-performing models (e.g., $41\times$ than GC-Net, $4\times$ than PSMNet and $38\times$ than GA-Net), but also improve the performance of fast stereo models (e.g., StereoNet). We also achieve competitive results on Scene Flow and KITTI datasets while running at 62ms, demonstrating the versatility and high efficiency of the proposed method. Our full framework is available at https://github.com/haofeixu/aanet .

PDF Abstract CVPR 2020 PDF CVPR 2020 Abstract

Results from the Paper

Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Scene Flow Estimation Scene Flow AANet EPE 0.068 # 1
Stereo Disparity Estimation Scene Flow AANet+ EPE 0.72 # 3
one pixel error 7.4 # 4
Stereo Disparity Estimation Scene Flow AANet EPE 0.87 # 1
one pixel error 9.3 # 7