From Captions to Visual Concepts and Back

CVPR 2015 Hao FangSaurabh GuptaForrest IandolaRupesh SrivastavaLi DengPiotr DollárJianfeng GaoXiaodong HeMargaret MitchellJohn C. PlattC. Lawrence ZitnickGeoffrey Zweig

This paper presents a novel approach for automatically generating image descriptions: visual detectors, language models, and multimodal similarity models learnt directly from a dataset of image captions. We use multiple instance learning to train visual detectors for words that commonly occur in captions, including many different parts of speech such as nouns, verbs, and adjectives... (read more)

PDF Abstract

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.