RefineCap: Concept-Aware Refinement for Image Captioning

8 Sep 2021  ·  Yekun Chai, Shuo Jin, Junliang Xing ·

Automatically translating images to texts involves image scene understanding and language modeling. In this paper, we propose a novel model, termed RefineCap, that refines the output vocabulary of the language decoder using decoder-guided visual semantics, and implicitly learns the mapping between visual tag words and images. The proposed Visual-Concept Refinement method can allow the generator to attend to semantic details in the image, thereby generating more semantically descriptive captions. Our model achieves superior performance on the MS-COCO dataset in comparison with previous visual-concept based models.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Image Captioning COCO Captions RefineCap (w/ REINFORCE) BLEU-4 37.8 # 27
METEOR 28.3 # 22
ROUGE-L 58.0 # 11
CIDER 127.2 # 26
SPICE 22.5 # 23
BLEU-1 80.2 # 8
BLEU-2 64.5 # 3
BLEU-3 49.9 # 3

Methods


No methods listed for this paper. Add relevant methods here