TASK |
DATASET |
MODEL |
METRIC NAME |
METRIC VALUE |
GLOBAL RANK |
REMOVE |
Visual Question Answering (VQA)
|
InfiMM-Eval
|
CogVLM-Chat
|
Overall score
|
37.16
|
# 4
|
|
Visual Question Answering (VQA)
|
InfiMM-Eval
|
CogVLM-Chat
|
Deductive
|
36.75
|
# 4
|
|
Visual Question Answering (VQA)
|
InfiMM-Eval
|
CogVLM-Chat
|
Abductive
|
47.88
|
# 4
|
|
Visual Question Answering (VQA)
|
InfiMM-Eval
|
CogVLM-Chat
|
Analogical
|
28.75
|
# 3
|
|
Visual Question Answering (VQA)
|
InfiMM-Eval
|
CogVLM-Chat
|
Params
|
17B
|
# 1
|
|
Long-Context Understanding
|
MMNeedle
|
CogVLM-17B
|
1 Image, 2*2 Stitching, Exact Accuracy
|
0
|
# 11
|
|
Long-Context Understanding
|
MMNeedle
|
CogVLM-17B
|
1 Image, 4*4 Stitching, Exact Accuracy
|
0.1
|
# 11
|
|
Long-Context Understanding
|
MMNeedle
|
CogVLM-17B
|
1 Image, 8*8 Stitching, Exact Accuracy
|
0.3
|
# 10
|
|
Long-Context Understanding
|
MMNeedle
|
CogVLM-17B
|
10 Images, 1*1 Stitching, Exact Accuracy
|
0
|
# 7
|
|
Long-Context Understanding
|
MMNeedle
|
CogVLM-17B
|
10 Images, 2*2 Stitching, Exact Accuracy
|
0
|
# 7
|
|
Long-Context Understanding
|
MMNeedle
|
CogVLM-17B
|
10 Images, 4*4 Stitching, Exact Accuracy
|
0
|
# 6
|
|
Long-Context Understanding
|
MMNeedle
|
CogVLM-17B
|
10 Images, 8*8 Stitching, Exact Accuracy
|
0
|
# 3
|
|
Long-Context Understanding
|
MMNeedle
|
CogVLM2-Llama-3
|
1 Image, 2*2 Stitching, Exact Accuracy
|
7.3
|
# 8
|
|
Long-Context Understanding
|
MMNeedle
|
CogVLM2-Llama-3
|
1 Image, 4*4 Stitching, Exact Accuracy
|
0.9
|
# 9
|
|
Long-Context Understanding
|
MMNeedle
|
CogVLM2-Llama-3
|
1 Image, 8*8 Stitching, Exact Accuracy
|
0.1
|
# 11
|
|
Long-Context Understanding
|
MMNeedle
|
CogVLM2-Llama-3
|
10 Images, 1*1 Stitching, Exact Accuracy
|
0
|
# 7
|
|
Long-Context Understanding
|
MMNeedle
|
CogVLM2-Llama-3
|
10 Images, 2*2 Stitching, Exact Accuracy
|
0
|
# 7
|
|
Long-Context Understanding
|
MMNeedle
|
CogVLM2-Llama-3
|
10 Images, 4*4 Stitching, Exact Accuracy
|
0
|
# 6
|
|
Long-Context Understanding
|
MMNeedle
|
CogVLM2-Llama-3
|
10 Images, 8*8 Stitching, Exact Accuracy
|
0
|
# 3
|
|
Visual Question Answering
|
MM-Vet
|
CogVLM(Vicuna-7B)
|
GPT-4 score
|
52.8
|
# 51
|
|
Visual Question Answering
|
MM-Vet
|
CogVLM(Vicuna-7B)
|
Params
|
17B
|
# 1
|
|
Visual Question Answering
|
MM-Vet
|
GLM4 Vision
|
GPT-4 score
|
63.9
|
# 25
|
|
Visual Question Answering
|
MM-Vet v2
|
CogVLM-Chat
|
GPT-4 score
|
45.1±0.2
|
# 14
|
|
FS-MEVQA
|
SME
|
GLM-4V
|
BLEU-4
|
14.45
|
# 5
|
|
FS-MEVQA
|
SME
|
GLM-4V
|
METEOR
|
17.53
|
# 6
|
|
FS-MEVQA
|
SME
|
GLM-4V
|
ROUGE-L
|
24.28
|
# 6
|
|
FS-MEVQA
|
SME
|
GLM-4V
|
CIDEr
|
127.37
|
# 5
|
|
FS-MEVQA
|
SME
|
GLM-4V
|
SPICE
|
17.70
|
# 5
|
|
FS-MEVQA
|
SME
|
GLM-4V
|
Detection
|
0.89
|
# 5
|
|
FS-MEVQA
|
SME
|
GLM-4V
|
ACC
|
34.23
|
# 5
|
|
FS-MEVQA
|
SME
|
GLM-4V
|
#Learning Samples (N)
|
16
|
# 1
|
|