no code implementations • 16 Mar 2023 • Alexander Groleau, Kok Wei Chee, Stefan Larson, Samay Maini, Jonathan Boarman
Document denoising and binarization are fundamental problems in the document processing space, but current datasets are often too small and lack sufficient complexity to effectively train and benchmark modern data-driven machine learning models.
2 code implementations • 30 Aug 2022 • Alexander Groleau, Kok Wei Chee, Stefan Larson, Samay Maini, Jonathan Boarman
This paper introduces Augraphy, a Python library for constructing data augmentation pipelines which produce distortions commonly seen in real-world document image datasets.