MUTATT: Visual-Textual Mutual Guidance for Referring Expression Comprehension

Referring expression comprehension (REC) aims to localize a text-related region in a given image by a referring expression in natural language. Existing methods focus on how to build convincing visual and language representations independently, which may significantly isolate visual and language information... (read more)

Results in Papers With Code
(↓ scroll down to see all results)