DocUNet: Document Image Unwarping via a Stacked U-Net

CVPR 2018  ·  Ke Ma, Zhixin Shu, Xue Bai, Jue Wang, Dimitris Samaras ·

Capturing document images is a common way for digitizing and recording physical documents due to the ubiquitousness of mobile cameras. To make text recognition easier, it is often desirable to digitally flatten a document image when the physical document sheet is folded or curved. In this paper, we develop the first learning-based method to achieve this goal. We propose a stacked U-Net with intermediate supervision to directly predict the forward mapping from a distorted image to its rectified version. Because large-scale real-world data with ground truth deformation is difficult to obtain, we create a synthetic dataset with approximately 100 thousand images by warping non-distorted document images. The network is trained on this dataset with various data augmentations to improve its generalization ability. We further create a comprehensive benchmark that covers various real-world conditions. We evaluate the proposed model quantitatively and qualitatively on the proposed benchmark, and compare it with previous non-learning-based methods.

PDF Abstract


  Add Datasets introduced or used in this paper

Results from the Paper

Ranked #3 on Local Distortion on DocUNet (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
MS-SSIM DocUNet DocUNet MS-SSIM 0.41 # 4
SSIM DocUNet DocUNet SSIM 0.4083 # 4
Local Distortion DocUNet DocUNet LD 14.08 # 3