Image Comprehension

7 papers with code • 0 benchmarks • 1 datasets

This task has no description! Would you like to contribute one?

Datasets


Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

dvlab-research/minigemini 27 Mar 2024

We try to narrow the gap by mining the potential of VLMs for better performance and any-to-any workflow from three aspects, i. e., high-resolution visual tokens, high-quality data, and VLM-guided generation.

2,874
27 Mar 2024

EarthGPT: A Universal Multi-modal Large Language Model for Multi-sensor Image Comprehension in Remote Sensing Domain

wivizhang/earthgpt 30 Jan 2024

Multi-modal large language models (MLLMs) have demonstrated remarkable success in vision and visual-language tasks within the natural image domain.

17
30 Jan 2024

InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition

internlm/internlm-xcomposer 26 Sep 2023

We propose InternLM-XComposer, a vision-language large model that enables advanced image-text comprehension and composition.

1,636
26 Sep 2023

RegionBLIP: A Unified Multi-modal Pre-training Framework for Holistic and Regional Comprehension

mightyzau/regionblip 3 Aug 2023

To this end, we propose to extract features corresponding to regional objects as soft prompts for LLM, which provides a straightforward and scalable approach and eliminates the need for LLM fine-tuning.

53
03 Aug 2023

JourneyDB: A Benchmark for Generative Image Understanding

shihaozhaozsh/lavi-bridge NeurIPS 2023

On our dataset, we have devised four benchmarks to assess the performance of generated image comprehension in relation to both content and style interpretation.

252
03 Jul 2023

Hierarchical Open-vocabulary Universal Image Segmentation

berkeley-hipie/hipie NeurIPS 2023

Open-vocabulary image segmentation aims to partition an image into semantic regions according to arbitrary text descriptions.

233
03 Jul 2023

ArtGPT-4: Towards Artistic-understanding Large Vision-Language Models with Enhanced Adapter

dlyuangod/artgpt-4 12 May 2023

However, a grand challenge of exploiting LLMs for multimodal learning is the size of pre-trained LLMs which are always with billions of parameters.

20
12 May 2023