TrOCR

Introduced by Li et al. in TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models

TrOCR is an end-to-end Transformer-based OCR model for text recognition with pre-trained CV and NLP models. It leverages the Transformer architecture for both image understanding and wordpiece-level text generation. It first resizes the input text image into $384 × 384$ and then the image is split into a sequence of 16 patches which are used as the input to image Transformers. Standard Transformer architecture with the self-attention mechanism is leveraged on both encoder and decoder parts, where wordpiece units are generated as the recognized text from the input image.

Source: TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Optical Character Recognition (OCR)	3	37.50%
Adversarial Attack	1	12.50%
Handwritten Text Recognition	1	12.50%
Language Modelling	1	12.50%
Scene Text Recognition	1	12.50%
Text Generation	1	12.50%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
Layer Normalization	Normalization
Multi-Head Attention	Attention Modules
Position-Wise Feed-Forward Layer	Feedforward Networks
Residual Connection	Skip Connections
Scaled Dot-Product Attention	Attention Mechanisms

Categories

Add Remove

OCR Models