In our model, the external knowledge is represented with sentence-level facts and graph-level facts, to properly suit the scenario of the composite of dialog history and image.
Semantic segmentation is one of the basic topics in computer vision, it aims to assign semantic labels to every pixel of an image.
The recent advances of deep learning in both computer vision (CV) and natural language processing (NLP) provide us a new way of understanding semantics, by which we can deal with more challenging tasks such as automatic description generation from natural images.
Under our learning policy, the Seq2Seq model can learn mappings gradually with noises.
In this paper, we explore a generative model for the task of generating unseen images with desired features.