General-purpose Visual Understanding Evaluation (G-VUE) is a comprehensive benchmark covering the full spectrum of visual cognitive abilities with four functional domains -- Perceive, Ground, Reason, and Act. The four domains are embodied in 11 carefully curated tasks, from 3D reconstruction to visual reasoning and manipulation.
Source: Perceive, Ground, Reason, and Act: A Benchmark for General-purpose Visual RepresentationPaper | Code | Results | Date | Stars |
---|