Zero-Shot Region Description

1 papers with code • 4 benchmarks • 2 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs

kahnchana/locvlm CVPR 2024

Integration of Large Language Models (LLMs) into visual domain tasks, resulting in visual-LLMs (V-LLMs), has enabled exceptional performance in vision-language tasks, particularly for visual question answering (VQA).