no code implementations • 17 Jul 2022 • Junpu Wang, Guili Xu, Fuju Yan, Jinjin Wang, Zhengsheng Wang
Then, the patch aggregation blocks are used to generate multi-scale representation with four hierarchies, each of them is followed by a series of DefT blocks, which respectively include a locally position-aware block for local position encoding, a lightweight multi-pooling self-attention to model multi-scale global contextual relationships with good computational efficiency, and a convolutional feed-forward network for feature transformation and further location information learning.