ViP-Bench (Making Large Multimodal Models Understand Arbitrary Visual Prompts)

Introduced by Cai et al. in ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

ViP-Bench is a comprehensive benchmark designed to assess the capability of multimodal models in understanding visual prompts across multiple dimensions. It aims to evaluate how well these models interpret various visual prompts, including recognition, OCR, knowledge, math, relationship reasoning, and language generation. ViP-Bench includes a diverse set of 303 images and questions, providing a thorough assessment of visual understanding capabilities at the region level. This benchmark sets a foundation for future research into multimodal models with arbitrary visual prompts.

Homepage

Benchmarks

Add a new result Link an existing benchmark

Trend	Task	Dataset Variant	Best Model	Paper	Code
	Visual Question Answering	ViP-Bench	GPT-4V-turbo-detail:high

Papers

Paper	Code	Results	Date	Stars

Dataset Loaders

Add Remove

No data loaders found. You can submit your data loader here.

Tasks

Visual Question Answering

Similar Datasets

CHOCOLATE

CORE-MM

ViP-Bench (Making Large Multimodal Models Understand Arbitrary Visual Prompts)

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Similar Datasets

CHOCOLATE

CORE-MM

InfiMM-Eval

BenchLMM

Usage

License

Modalities

Languages

ViP-Bench (Making Large Multimodal Models Understand Arbitrary Visual Prompts)

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Similar Datasets

CHOCOLATE

CORE-MM

InfiMM-Eval

BenchLMM

Usage

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages