3 code implementations • 4 Jun 2018 • Fenfen Sheng, Zhineng Chen, Bo Xu
Considering scene image has large variation in text and background, we further design a modality-transform block to effectively transform 2D input images to 1D sequences, combined with the encoder to extract more discriminative features.