The Lexical Gap: An Improved Measure of Automated Image Description Quality

WS 2019  ·  Austin Kershaw, Miroslaw Bober ·

The challenge of automatically describing images and videos has stimulated much research in Computer Vision and Natural Language Processing. In order to test the semantic abilities of new algorithms, we need reliable and objective ways of measuring progress. We show that standard evaluation measures do not take into account the semantic richness of a description, and give the impression that sparse machine descriptions outperform rich human descriptions. We introduce and test a new measure of semantic ability based on relative lexical diversity. We show how our measure can work alongside existing measures to achieve state of the art correlation with human judgement of quality. We also introduce a new dataset: Rich-Sparse Descriptions, which provides 2K human and machine descriptions to stimulate interest into the semantic evaluation of machine descriptions.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here