Computer Vision

Image Comprehension

7 papers with code • 0 benchmarks • 1 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Image Comprehension

No evaluation results yet. Help compare methods by submitting evaluation metrics.

Datasets

Visual7W

Latest papers

Most implemented Social Latest No code

Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

dvlab-research/minigemini • • 27 Mar 2024

We try to narrow the gap by mining the potential of VLMs for better performance and any-to-any workflow from three aspects, i. e., high-resolution visual tokens, high-quality data, and VLM-guided generation.

2,874

27 Mar 2024

Paper
Code

EarthGPT: A Universal Multi-modal Large Language Model for Multi-sensor Image Comprehension in Remote Sensing Domain

wivizhang/earthgpt • 30 Jan 2024

Multi-modal large language models (MLLMs) have demonstrated remarkable success in vision and visual-language tasks within the natural image domain.

30 Jan 2024

Paper
Code

InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition

internlm/internlm-xcomposer • • 26 Sep 2023

We propose InternLM-XComposer, a vision-language large model that enables advanced image-text comprehension and composition.

1,636

26 Sep 2023

Paper
Code

RegionBLIP: A Unified Multi-modal Pre-training Framework for Holistic and Regional Comprehension

mightyzau/regionblip • • 3 Aug 2023

To this end, we propose to extract features corresponding to regional objects as soft prompts for LLM, which provides a straightforward and scalable approach and eliminates the need for LLM fine-tuning.

03 Aug 2023

Paper
Code

JourneyDB: A Benchmark for Generative Image Understanding

shihaozhaozsh/lavi-bridge • • NeurIPS 2023

On our dataset, we have devised four benchmarks to assess the performance of generated image comprehension in relation to both content and style interpretation.

252

03 Jul 2023

Paper
Code

Hierarchical Open-vocabulary Universal Image Segmentation

berkeley-hipie/hipie • • NeurIPS 2023

Open-vocabulary image segmentation aims to partition an image into semantic regions according to arbitrary text descriptions.

233

03 Jul 2023

Paper
Code

ArtGPT-4: Towards Artistic-understanding Large Vision-Language Models with Enhanced Adapter

dlyuangod/artgpt-4 • • 12 May 2023

However, a grand challenge of exploiting LLMs for multimodal learning is the size of pre-trained LLMs which are always with billions of parameters.

12 May 2023

Paper
Code

Image Comprehension

Benchmarks Add a Result

Datasets

Latest papers

Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

EarthGPT: A Universal Multi-modal Large Language Model for Multi-sensor Image Comprehension in Remote Sensing Domain

InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition

RegionBLIP: A Unified Multi-modal Pre-training Framework for Holistic and Regional Comprehension

JourneyDB: A Benchmark for Generative Image Understanding

Hierarchical Open-vocabulary Universal Image Segmentation

ArtGPT-4: Towards Artistic-understanding Large Vision-Language Models with Enhanced Adapter

Content

Benchmarks

Add a Result