no code implementations • 16 Nov 2021 • Yue Tao, Zhiwei Jia, Runze Ma, Shugong Xu
We propose a 1-D split to address the challenges of complexity and replace the CNN with the transformer encoder to reduce the need for a context modeling module.
Inductive Bias Scene Text Recognition