no code implementations • 29 May 2019 • Xuelong. Li, Aihong Yuan, Xiaoqiang Lu
To make full use of these information, this paper attempt to exploit the text guided attention and semantic-guided attention (SA) to find the more correlated spatial information and reduce the semantic gap between vision and language.
no code implementations • 21 Apr 2019 • Aihong Yuan, Xuelong. Li, Xiaoqiang Lu
In this paper, we propose a model with 3-gated model which fuses the global and local image features together for the task of image caption generation.
no code implementations • 20 Apr 2019 • Xuelong. Li, Aihong Yuan, Xiaoqiang Lu
And in the testing step, when an image is imported to our multi-modal GRU model, a sentence which describes the image content is generated.