TASK |
DATASET |
MODEL |
METRIC NAME |
METRIC VALUE |
GLOBAL RANK |
REMOVE |
Few-Shot Text Classification
|
RAFT
|
Human (crowdsourced)
|
Avg
|
0.735
|
# 2
|
|
Few-Shot Text Classification
|
RAFT
|
Human (crowdsourced)
|
ADE
|
0.830
|
# 1
|
|
Few-Shot Text Classification
|
RAFT
|
Human (crowdsourced)
|
B77
|
0.607
|
# 2
|
|
Few-Shot Text Classification
|
RAFT
|
Human (crowdsourced)
|
NIS
|
0.857
|
# 1
|
|
Few-Shot Text Classification
|
RAFT
|
Human (crowdsourced)
|
OSE
|
0.646
|
# 2
|
|
Few-Shot Text Classification
|
RAFT
|
Human (crowdsourced)
|
Over
|
0.917
|
# 3
|
|
Few-Shot Text Classification
|
RAFT
|
Human (crowdsourced)
|
SOT
|
0.908
|
# 2
|
|
Few-Shot Text Classification
|
RAFT
|
Human (crowdsourced)
|
SRI
|
0.468
|
# 7
|
|
Few-Shot Text Classification
|
RAFT
|
Human (crowdsourced)
|
TAI
|
0.609
|
# 4
|
|
Few-Shot Text Classification
|
RAFT
|
Human (crowdsourced)
|
ToS
|
0.627
|
# 2
|
|
Few-Shot Text Classification
|
RAFT
|
Human (crowdsourced)
|
TEH
|
0.722
|
# 1
|
|
Few-Shot Text Classification
|
RAFT
|
Human (crowdsourced)
|
TC
|
0.897
|
# 1
|
|
Few-Shot Text Classification
|
RAFT
|
GPT-3 zero-shot
|
Avg
|
0.292
|
# 9
|
|
Few-Shot Text Classification
|
RAFT
|
GPT-3 zero-shot
|
ADE
|
0.163
|
# 9
|
|
Few-Shot Text Classification
|
RAFT
|
GPT-3 zero-shot
|
B77
|
0.000
|
# 8
|
|
Few-Shot Text Classification
|
RAFT
|
GPT-3 zero-shot
|
NIS
|
0.572
|
# 6
|
|
Few-Shot Text Classification
|
RAFT
|
GPT-3 zero-shot
|
OSE
|
0.323
|
# 7
|
|
Few-Shot Text Classification
|
RAFT
|
GPT-3 zero-shot
|
Over
|
0.378
|
# 8
|
|
Few-Shot Text Classification
|
RAFT
|
GPT-3 zero-shot
|
SOT
|
0.628
|
# 5
|
|
Few-Shot Text Classification
|
RAFT
|
GPT-3 zero-shot
|
SRI
|
0.027
|
# 8
|
|
Few-Shot Text Classification
|
RAFT
|
GPT-3 zero-shot
|
TAI
|
0.362
|
# 8
|
|
Few-Shot Text Classification
|
RAFT
|
GPT-3 zero-shot
|
ToS
|
0.164
|
# 8
|
|
Few-Shot Text Classification
|
RAFT
|
GPT-3 zero-shot
|
TEH
|
0.303
|
# 9
|
|
Few-Shot Text Classification
|
RAFT
|
GPT-3 zero-shot
|
TC
|
0.290
|
# 9
|
|
Few-Shot Text Classification
|
RAFT
|
Plurality-class
|
Avg
|
0.331
|
# 8
|
|
Few-Shot Text Classification
|
RAFT
|
Plurality-class
|
ADE
|
0.446
|
# 7
|
|
Few-Shot Text Classification
|
RAFT
|
Plurality-class
|
B77
|
0.000
|
# 8
|
|
Few-Shot Text Classification
|
RAFT
|
Plurality-class
|
NIS
|
0.353
|
# 9
|
|
Few-Shot Text Classification
|
RAFT
|
Plurality-class
|
OSE
|
0.164
|
# 9
|
|
Few-Shot Text Classification
|
RAFT
|
Plurality-class
|
Over
|
0.337
|
# 9
|
|
Few-Shot Text Classification
|
RAFT
|
Plurality-class
|
SOT
|
0.271
|
# 9
|
|
Few-Shot Text Classification
|
RAFT
|
Plurality-class
|
SRI
|
0.493
|
# 4
|
|
Few-Shot Text Classification
|
RAFT
|
Plurality-class
|
TAI
|
0.344
|
# 9
|
|
Few-Shot Text Classification
|
RAFT
|
Plurality-class
|
ToS
|
0.471
|
# 7
|
|
Few-Shot Text Classification
|
RAFT
|
Plurality-class
|
TEH
|
0.366
|
# 7
|
|
Few-Shot Text Classification
|
RAFT
|
Plurality-class
|
TC
|
0.391
|
# 8
|
|
Few-Shot Text Classification
|
RAFT
|
BART MNLI zero-shot
|
Avg
|
0.382
|
# 7
|
|
Few-Shot Text Classification
|
RAFT
|
BART MNLI zero-shot
|
ADE
|
0.234
|
# 8
|
|
Few-Shot Text Classification
|
RAFT
|
BART MNLI zero-shot
|
B77
|
0.332
|
# 3
|
|
Few-Shot Text Classification
|
RAFT
|
BART MNLI zero-shot
|
NIS
|
0.615
|
# 5
|
|
Few-Shot Text Classification
|
RAFT
|
BART MNLI zero-shot
|
OSE
|
0.360
|
# 5
|
|
Few-Shot Text Classification
|
RAFT
|
BART MNLI zero-shot
|
Over
|
0.462
|
# 7
|
|
Few-Shot Text Classification
|
RAFT
|
BART MNLI zero-shot
|
SOT
|
0.644
|
# 4
|
|
Few-Shot Text Classification
|
RAFT
|
BART MNLI zero-shot
|
SRI
|
0.026
|
# 9
|
|
Few-Shot Text Classification
|
RAFT
|
BART MNLI zero-shot
|
TAI
|
0.469
|
# 7
|
|
Few-Shot Text Classification
|
RAFT
|
BART MNLI zero-shot
|
ToS
|
0.122
|
# 9
|
|
Few-Shot Text Classification
|
RAFT
|
BART MNLI zero-shot
|
TEH
|
0.543
|
# 4
|
|
Few-Shot Text Classification
|
RAFT
|
BART MNLI zero-shot
|
TC
|
0.400
|
# 7
|
|
Few-Shot Text Classification
|
RAFT
|
GPT-2
|
Avg
|
0.458
|
# 6
|
|
Few-Shot Text Classification
|
RAFT
|
GPT-2
|
ADE
|
0.600
|
# 4
|
|
Few-Shot Text Classification
|
RAFT
|
GPT-2
|
B77
|
0.121
|
# 6
|
|
Few-Shot Text Classification
|
RAFT
|
GPT-2
|
NIS
|
0.561
|
# 7
|
|
Few-Shot Text Classification
|
RAFT
|
GPT-2
|
OSE
|
0.245
|
# 8
|
|
Few-Shot Text Classification
|
RAFT
|
GPT-2
|
Over
|
0.498
|
# 6
|
|
Few-Shot Text Classification
|
RAFT
|
GPT-2
|
SOT
|
0.380
|
# 8
|
|
Few-Shot Text Classification
|
RAFT
|
GPT-2
|
SRI
|
0.492
|
# 6
|
|
Few-Shot Text Classification
|
RAFT
|
GPT-2
|
TAI
|
0.612
|
# 3
|
|
Few-Shot Text Classification
|
RAFT
|
GPT-2
|
ToS
|
0.498
|
# 6
|
|
Few-Shot Text Classification
|
RAFT
|
GPT-2
|
TEH
|
0.311
|
# 8
|
|
Few-Shot Text Classification
|
RAFT
|
GPT-2
|
TC
|
0.723
|
# 4
|
|
Few-Shot Text Classification
|
RAFT
|
GPT-Neo
|
Avg
|
0.481
|
# 5
|
|
Few-Shot Text Classification
|
RAFT
|
GPT-Neo
|
ADE
|
0.452
|
# 6
|
|
Few-Shot Text Classification
|
RAFT
|
GPT-Neo
|
B77
|
0.149
|
# 5
|
|
Few-Shot Text Classification
|
RAFT
|
GPT-Neo
|
NIS
|
0.408
|
# 8
|
|
Few-Shot Text Classification
|
RAFT
|
GPT-Neo
|
OSE
|
0.343
|
# 6
|
|
Few-Shot Text Classification
|
RAFT
|
GPT-Neo
|
Over
|
0.681
|
# 5
|
|
Few-Shot Text Classification
|
RAFT
|
GPT-Neo
|
SOT
|
0.406
|
# 7
|
|
Few-Shot Text Classification
|
RAFT
|
GPT-Neo
|
SRI
|
0.493
|
# 4
|
|
Few-Shot Text Classification
|
RAFT
|
GPT-Neo
|
TAI
|
0.605
|
# 5
|
|
Few-Shot Text Classification
|
RAFT
|
GPT-Neo
|
ToS
|
0.565
|
# 4
|
|
Few-Shot Text Classification
|
RAFT
|
GPT-Neo
|
TEH
|
0.554
|
# 3
|
|
Few-Shot Text Classification
|
RAFT
|
GPT-Neo
|
TC
|
0.636
|
# 5
|
|
Few-Shot Text Classification
|
RAFT
|
AdaBoost
|
Avg
|
0.514
|
# 4
|
|
Few-Shot Text Classification
|
RAFT
|
AdaBoost
|
ADE
|
0.543
|
# 5
|
|
Few-Shot Text Classification
|
RAFT
|
AdaBoost
|
B77
|
0.023
|
# 7
|
|
Few-Shot Text Classification
|
RAFT
|
AdaBoost
|
NIS
|
0.626
|
# 4
|
|
Few-Shot Text Classification
|
RAFT
|
AdaBoost
|
OSE
|
0.475
|
# 3
|
|
Few-Shot Text Classification
|
RAFT
|
AdaBoost
|
Over
|
0.838
|
# 4
|
|
Few-Shot Text Classification
|
RAFT
|
AdaBoost
|
SOT
|
0.455
|
# 6
|
|
Few-Shot Text Classification
|
RAFT
|
AdaBoost
|
SRI
|
0.506
|
# 3
|
|
Few-Shot Text Classification
|
RAFT
|
AdaBoost
|
TAI
|
0.556
|
# 6
|
|
Few-Shot Text Classification
|
RAFT
|
AdaBoost
|
ToS
|
0.560
|
# 5
|
|
Few-Shot Text Classification
|
RAFT
|
AdaBoost
|
TEH
|
0.443
|
# 6
|
|
Few-Shot Text Classification
|
RAFT
|
AdaBoost
|
TC
|
0.625
|
# 6
|
|
Few-Shot Text Classification
|
RAFT
|
GPT-3
|
Avg
|
0.627
|
# 3
|
|
Few-Shot Text Classification
|
RAFT
|
GPT-3
|
ADE
|
0.686
|
# 3
|
|
Few-Shot Text Classification
|
RAFT
|
GPT-3
|
B77
|
0.299
|
# 4
|
|
Few-Shot Text Classification
|
RAFT
|
GPT-3
|
NIS
|
0.679
|
# 3
|
|
Few-Shot Text Classification
|
RAFT
|
GPT-3
|
OSE
|
0.431
|
# 4
|
|
Few-Shot Text Classification
|
RAFT
|
GPT-3
|
Over
|
0.937
|
# 2
|
|
Few-Shot Text Classification
|
RAFT
|
GPT-3
|
SOT
|
0.769
|
# 3
|
|
Few-Shot Text Classification
|
RAFT
|
GPT-3
|
SRI
|
0.516
|
# 1
|
|
Few-Shot Text Classification
|
RAFT
|
GPT-3
|
TAI
|
0.656
|
# 2
|
|
Few-Shot Text Classification
|
RAFT
|
GPT-3
|
ToS
|
0.574
|
# 3
|
|
Few-Shot Text Classification
|
RAFT
|
GPT-3
|
TEH
|
0.526
|
# 5
|
|
Few-Shot Text Classification
|
RAFT
|
GPT-3
|
TC
|
0.821
|
# 3
|
|