TASK |
DATASET |
MODEL |
METRIC NAME |
METRIC VALUE |
GLOBAL RANK |
REMOVE |
Visual Question Answering (VQA)
|
AutoHallusion
|
LLaVA-1.5
|
Overall Accuracy
|
44.5
|
# 4
|
|
Visual Question Answering
|
BenchLMM
|
LLaVA-1.5-13B
|
GPT-3.5 score
|
55.53
|
# 3
|
|
Factual Inconsistency Detection in Chart Captioning
|
CHOCOLATE-FT
|
LLaVA-1.5-13B
|
Kendall's Tau-c
|
0.214
|
# 4
|
|
Factual Inconsistency Detection in Chart Captioning
|
CHOCOLATE-LLM
|
LLaVA-1.5-13B
|
Kendall's Tau-c
|
0.057
|
# 5
|
|
Factual Inconsistency Detection in Chart Captioning
|
CHOCOLATE-LVLM
|
LLaVA-1.5-13B
|
Kendall's Tau-c
|
0.002
|
# 4
|
|
Image Classification
|
ColonINST-v1 (Seen)
|
LLaVA-v1.5 (w/ LoRA, w/o extra data)
|
Accuray
|
92.97
|
# 9
|
|
Referring Expression Comprehension
|
ColonINST-v1 (Seen)
|
LLaVA-v1.5
(w/ LoRA, w/ extra data)
|
Intersection over Union
|
61.97
|
# 3
|
|
Referring Expression Comprehension
|
ColonINST-v1 (Seen)
|
LLaVA-v1.5
(w/ LoRA, w/o extra data)
|
Intersection over Union
|
55.72
|
# 5
|
|
Referring expression generation
|
ColonINST-v1 (Seen)
|
LLaVA-v1.5
(w/ LoRA, w/ extra data)
|
Accuray
|
99.32
|
# 2
|
|
Referring expression generation
|
ColonINST-v1 (Seen)
|
LLaVA-v1.5
(w/ LoRA, w/o extra data)
|
Accuray
|
98.58
|
# 6
|
|
Image Classification
|
ColonINST-v1 (Seen)
|
LLaVA-v1.5 (w/ LoRA, w/ extra data)
|
Accuray
|
93.33
|
# 6
|
|
Referring expression generation
|
ColonINST-v1 (Unseen)
|
LLaVA-v1.5
(w/ LoRA, w/ extra data)
|
Accuray
|
72.88
|
# 9
|
|
Image Classification
|
ColonINST-v1 (Unseen)
|
LLaVA-v1.5
(w/ LoRA, w/o extra data)
|
Accuray
|
79.10
|
# 6
|
|
Referring Expression Comprehension
|
ColonINST-v1 (Unseen)
|
LLaVA-v1.5
(w/ LoRA, w/ extra data)
|
Intersection over Union
|
42.31
|
# 2
|
|
Referring Expression Comprehension
|
ColonINST-v1 (Unseen)
|
LLaVA-v1.5
(w/ LoRA, w/o extra data)
|
Intersection over Union
|
34.32
|
# 6
|
|
Image Classification
|
ColonINST-v1 (Unseen)
|
LLaVA-v1.5
(w/ LoRA, w/ extra data)
|
Accuray
|
80.89
|
# 2
|
|
Referring expression generation
|
ColonINST-v1 (Unseen)
|
LLaVA-v1.5
(w/ LoRA, w/o extra data)
|
Accuray
|
70.38
|
# 11
|
|
Visual Question Answering (VQA)
|
InfiMM-Eval
|
LLaVA-1.5
|
Overall score
|
32.62
|
# 5
|
|
Visual Question Answering (VQA)
|
InfiMM-Eval
|
LLaVA-1.5
|
Deductive
|
30.94
|
# 5
|
|
Visual Question Answering (VQA)
|
InfiMM-Eval
|
LLaVA-1.5
|
Abductive
|
47.91
|
# 3
|
|
Visual Question Answering (VQA)
|
InfiMM-Eval
|
LLaVA-1.5
|
Analogical
|
24.31
|
# 4
|
|
Visual Question Answering (VQA)
|
InfiMM-Eval
|
LLaVA-1.5
|
Params
|
13B
|
# 1
|
|
visual instruction following
|
LLaVA-Bench
|
LLaVA-v1.5-13B
|
avg score
|
70.7
|
# 4
|
|
visual instruction following
|
LLaVA-Bench
|
LLaVA-v1.5-7B
|
avg score
|
63.4
|
# 5
|
|
Visual Question Answering
|
MM-Vet
|
LLaVA-1.5-7B
|
GPT-4 score
|
31.1±0.2
|
# 157
|
|
Visual Question Answering
|
MM-Vet
|
LLaVA-1.5-7B
|
Params
|
7B
|
# 1
|
|
Visual Question Answering
|
MM-Vet
|
LLaVA-1.5-13B
|
GPT-4 score
|
36.3±0.2
|
# 112
|
|
Visual Question Answering
|
MM-Vet
|
LLaVA-1.5-13B
|
Params
|
13B
|
# 1
|
|
Visual Question Answering
|
MM-Vet v2
|
LLaVA-v1.5-13B
|
GPT-4 score
|
33.2±0.1
|
# 18
|
|
Visual Question Answering
|
MM-Vet v2
|
LLaVA-v1.5-13B
|
Params
|
13B
|
# 1
|
|
Visual Question Answering
|
MM-Vet v2
|
LLaVA-v1.5-7B
|
GPT-4 score
|
28.3±0.2
|
# 19
|
|
Visual Question Answering
|
MM-Vet v2
|
LLaVA-v1.5-7B
|
Params
|
7B
|
# 1
|
|
Visual Question Answering
|
ViP-Bench
|
LLaVA-1.5-13B (Visual Prompt)
|
GPT-4 score (bbox)
|
41.8
|
# 6
|
|
Visual Question Answering
|
ViP-Bench
|
LLaVA-1.5-13B (Visual Prompt)
|
GPT-4 score (human)
|
42.9
|
# 4
|
|
Visual Question Answering
|
ViP-Bench
|
LLaVA-1.5-13B (Coordinates)
|
GPT-4 score (bbox)
|
47.1
|
# 4
|
|