Zero-Shot Region Description
1 papers with code • 4 benchmarks • 2 datasets
This task has no description! Would you like to contribute one?
Most implemented papers
Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs
Integration of Large Language Models (LLMs) into visual domain tasks, resulting in visual-LLMs (V-LLMs), has enabled exceptional performance in vision-language tasks, particularly for visual question answering (VQA).