3 papers with code • 0 benchmarks • 1 datasets
These leaderboards are used to track progress in Image Comprehension
To this end, we propose to extract features corresponding to regional objects as soft prompts for LLM, which provides a straightforward and scalable approach and eliminates the need for LLM fine-tuning.
InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition
We propose InternLM-XComposer, a vision-language large model that enables advanced image-text comprehension and composition.