…In this game, the first player views an image with a segmented target object and writes a natural language expression referring to that object. These datasets serve as valuable resources for tasks like referring expression segmentation, comprehension, and visual grounding in computer vision research.
302 PAPERS • 19 BENCHMARKS