Computed tomography (CT) imaging is a promising approach to diagnosing the COVID-19. Machine learning methods can be employed to train models from labeled CT images and predict whether a case is positive or negative. However, there exists no publicly-available and large-scale CT data to train accurate models. In this work, we propose a multi-stage attentive transfer learning framework for improving COVID-19 diagnosis. Our proposed framework consists of three stages to train accurate diagnosis models through learning knowledge from multiple source tasks and data of different domains. Importantly, we propose a novel self-supervised learning method to learn multi-scale representations for lung CT images. Our method captures semantic information from the whole lung and highlights the functionality of each lung region for better representation learning. The method is then integrated to the last stage of the proposed transfer learning framework to reuse the complex patterns learned from the same CT images. We use a base model integrating self-attention (ATTNs) and convolutional operations. Experimental results show that networks with ATTNs induce greater performance improvement through transfer learning than networks without ATTNs. This indicates attention exhibits higher transferability than convolution. Our results also show that the proposed self-supervised learning method outperforms several baseline methods.