Datasets > Modality > Texts > RefCoco


Introduced by Yu et al. in Modeling Context in Referring Expressions

This referring expression generation (REG) dataset was collected using the ReferitGame. In this two-player game, the first player is shown an image with a segmented target object and asked to write a natural language expression referring to the target object. The second player is shown only the image and the referring expression and asked to click on the corresponding object. If the players do their job correctly, they receive points and swap roles. If not, they are presented with a new object and image for description. Images in these collections were selected to contain two or more objects of the same object category. In the RefCOCO dataset, no restrictions are placed on the type of language used in the referring expressions. In a version of this dataset called RefCOCO+ players are disallowed from using location words in their referring expressions by adding “taboo” words to the ReferItGame. This dataset was collected to obtain a referring expression dataset focsed on purely appearance based description, e.g., “the man in the yellow polka-dotted shirt” rather than “the second man from the left”, which tend to be more interesting from a computer vision based perspective and are independent of viewer perspective. RefCOCO consists of 142,209 refer expressions for 50,000 objects in 19,994 images, and RefCOCO+ has 141,564 expressions for 49,856 objects in 19,992 images.