MemCap: Memorizing Style Knowledge for Image Captioning

AAAI 2020  ·  Wentian Zhao, Xinxiao wu, Xiaoxun Zhang ·

Generating stylized captions for images is a challenging task since it requires not only describing the content of the image accurately but also expressing the desired linguistic style appropriately. In this paper, we propose MemCap, a novel stylized image captioning method that explicitly encodes the knowledge about linguistic styles with memory mechanism. Rather than relying heavily on a language model to capture style factors in existing methods, our method resorts to memorizing stylized elements learned from training corpus. Particularly, we design a memory module that comprises a set of embedding vectors for encoding style-related phrases in training corpus. To acquire the style-related phrases, we develop a sentence decomposing algorithm that splits a stylized sentence into a style-related part that reflects the linguistic style and a content-related part that contains the visual content. When generating captions, our MemCap first extracts content-relevant style knowledge from the memory module via an attention mechanism and then incorporates the extracted knowledge into a language model. Extensive experiments on two stylized image captioning datasets (SentiCap and FlickrStyle10K) demonstrate the effectiveness of our method.

PDF Abstract

Results from the Paper

Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Image Captioning FlickrStyle10K MSCap BLEU-1 (Romantic) 50.8 # 1


No methods listed for this paper. Add relevant methods here