FlowFormer: A Transformer Architecture for Optical Flow

We introduce optical Flow transFormer, dubbed as FlowFormer, a transformer-based neural network architecture for learning optical flow. FlowFormer tokenizes the 4D cost volume built from an image pair, encodes the cost tokens into a cost memory with alternate-group transformer (AGT) layers in a novel latent space, and decodes the cost memory via a recurrent transformer decoder with dynamic positional cost queries. On the Sintel benchmark, FlowFormer achieves 1.159 and 2.088 average end-point-error (AEPE) on the clean and final pass, a 16.5% and 15.5% error reduction from the best published result (1.388 and 2.47). Besides, FlowFormer also achieves strong generalization performance. Without being trained on Sintel, FlowFormer achieves 1.01 AEPE on the clean pass of Sintel training set, outperforming the best published result (1.29) by 21.7%.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Optical Flow Estimation KITTI 2015 (train) FlowFormer F1-all 14.7 # 3
EPE 4.09 # 3
Optical Flow Estimation Sintel-clean FlowFormer Average End-Point Error 1.16 # 3
Optical Flow Estimation Sintel-final FlowFormer Average End-Point Error 2.09 # 1
Optical Flow Estimation Spring FlowFormer 1px total 6.510 # 4

Methods


No methods listed for this paper. Add relevant methods here