Nowadays, as cameras are rapidly adopted in our daily routine, images of documents are becoming both abundant and prevalent.
Although the topic of confidence calibration has been an active research area for the last several decades, the case of structured and sequence prediction calibration has been scarcely explored.
We propose a framework for sequence-to-sequence contrastive learning (SeqCLR) of visual representations, which we apply to text recognition.
We present CREASE: Content Aware Rectification using Angle Supervision, the first learned method for document rectification that relies on the document's content, the location of the words and specifically their orientation, as hints to assist in the rectification process.
The first attention step re-weights visual features from a CNN backbone together with contextual features computed by a BiLSTM layer.
This is especially true for handwritten text recognition (HTR), where each author has a unique style, unlike printed text, where the variation is smaller by design.
We propose a computational model for shape, illumination and albedo inference in a pulsed time-of-flight (TOF) camera.