TASK |
DATASET |
MODEL |
METRIC NAME |
METRIC VALUE |
GLOBAL RANK |
REMOVE |
Multi-Choice MRC
|
ExpMRC - C3 (test)
|
MacBERT-large + MSS w/ Ques. (single model)
|
Answer F1
|
72.000
|
# 3
|
|
Multi-Choice MRC
|
ExpMRC - C3 (test)
|
MacBERT-large + MSS w/ Ques. (single model)
|
Evidence F1
|
58.400
|
# 3
|
|
Multi-Choice MRC
|
ExpMRC - C3 (test)
|
MacBERT-large + MSS w/ Ques. (single model)
|
Overall F1
|
46.000
|
# 3
|
|
Multi-Choice MRC
|
ExpMRC - C3 (test)
|
MacBERT-base + MSS w/ Ques. (single model)
|
Answer F1
|
66.800
|
# 5
|
|
Multi-Choice MRC
|
ExpMRC - C3 (test)
|
MacBERT-base + MSS w/ Ques. (single model)
|
Evidence F1
|
57.400
|
# 5
|
|
Multi-Choice MRC
|
ExpMRC - C3 (test)
|
MacBERT-base + MSS w/ Ques. (single model)
|
Overall F1
|
42.300
|
# 4
|
|
Multi-Choice MRC
|
ExpMRC - C3 (test)
|
MacBERT-large + Pseudo-data (single model)
|
Answer F1
|
74.400
|
# 2
|
|
Multi-Choice MRC
|
ExpMRC - C3 (test)
|
MacBERT-large + Pseudo-data (single model)
|
Evidence F1
|
59.900
|
# 2
|
|
Multi-Choice MRC
|
ExpMRC - C3 (test)
|
MacBERT-large + Pseudo-data (single model)
|
Overall F1
|
47.300
|
# 2
|
|
Multi-Choice MRC
|
ExpMRC - C3 (test)
|
Human Performance
|
Answer F1
|
94.3
|
# 1
|
|
Multi-Choice MRC
|
ExpMRC - C3 (test)
|
Human Performance
|
Evidence F1
|
97.7
|
# 1
|
|
Multi-Choice MRC
|
ExpMRC - C3 (test)
|
Human Performance
|
Overall F1
|
90.0
|
# 1
|
|
Multi-Choice MRC
|
ExpMRC - C3 (test)
|
MacBERT-base + Pseudo-data (single model)
|
Answer F1
|
69.000
|
# 4
|
|
Multi-Choice MRC
|
ExpMRC - C3 (test)
|
MacBERT-base + Pseudo-data (single model)
|
Evidence F1
|
57.500
|
# 4
|
|
Multi-Choice MRC
|
ExpMRC - C3 (test)
|
MacBERT-base + Pseudo-data (single model)
|
Overall F1
|
40.600
|
# 5
|
|
Span-Extraction MRC
|
ExpMRC - CMRC (test)
|
MacBERT-large + MSS w/ Ques. (single model)
|
Answer F1
|
88.600
|
# 2
|
|
Span-Extraction MRC
|
ExpMRC - CMRC (test)
|
MacBERT-large + MSS w/ Ques. (single model)
|
Evidence F1
|
71.000
|
# 2
|
|
Span-Extraction MRC
|
ExpMRC - CMRC (test)
|
MacBERT-large + MSS w/ Ques. (single model)
|
Overall F1
|
63.200
|
# 3
|
|
Span-Extraction MRC
|
ExpMRC - CMRC (test)
|
MacBERT-base + MSS w/ Ques. (single model)
|
Answer F1
|
84.400
|
# 4
|
|
Span-Extraction MRC
|
ExpMRC - CMRC (test)
|
MacBERT-base + MSS w/ Ques. (single model)
|
Evidence F1
|
69.800
|
# 4
|
|
Span-Extraction MRC
|
ExpMRC - CMRC (test)
|
MacBERT-base + MSS w/ Ques. (single model)
|
Overall F1
|
59.900
|
# 4
|
|
Span-Extraction MRC
|
ExpMRC - CMRC (test)
|
Human Performance
|
Answer F1
|
97.9
|
# 1
|
|
Span-Extraction MRC
|
ExpMRC - CMRC (test)
|
Human Performance
|
Evidence F1
|
94.6
|
# 1
|
|
Span-Extraction MRC
|
ExpMRC - CMRC (test)
|
Human Performance
|
Overall F1
|
92.6
|
# 1
|
|
Span-Extraction MRC
|
ExpMRC - CMRC (test)
|
MacBERT-large + PA Sent. (single model)
|
Answer F1
|
88.600
|
# 2
|
|
Span-Extraction MRC
|
ExpMRC - CMRC (test)
|
MacBERT-large + PA Sent. (single model)
|
Evidence F1
|
70.600
|
# 3
|
|
Span-Extraction MRC
|
ExpMRC - CMRC (test)
|
MacBERT-large + PA Sent. (single model)
|
Overall F1
|
63.300
|
# 2
|
|
Span-Extraction MRC
|
ExpMRC - CMRC (test)
|
MacBERT-base + PA Sent. (single model)
|
Answer F1
|
84.400
|
# 4
|
|
Span-Extraction MRC
|
ExpMRC - CMRC (test)
|
MacBERT-base + PA Sent. (single model)
|
Evidence F1
|
69.100
|
# 5
|
|
Span-Extraction MRC
|
ExpMRC - CMRC (test)
|
MacBERT-base + PA Sent. (single model)
|
Overall F1
|
59.800
|
# 5
|
|
Multi-Choice MRC
|
ExpMRC - RACE+ (test)
|
BERT-base + MSS w/ Ques. (single model)
|
Answer F1
|
59.800
|
# 5
|
|
Multi-Choice MRC
|
ExpMRC - RACE+ (test)
|
BERT-base + MSS w/ Ques. (single model)
|
Evidence F1
|
41.800
|
# 4
|
|
Multi-Choice MRC
|
ExpMRC - RACE+ (test)
|
BERT-base + MSS w/ Ques. (single model)
|
Overall F1
|
27.300
|
# 4
|
|
Multi-Choice MRC
|
ExpMRC - RACE+ (test)
|
BERT-base + Pseudo-data (single model)
|
Answer F1
|
60.100
|
# 4
|
|
Multi-Choice MRC
|
ExpMRC - RACE+ (test)
|
BERT-base + Pseudo-data (single model)
|
Evidence F1
|
43.500
|
# 2
|
|
Multi-Choice MRC
|
ExpMRC - RACE+ (test)
|
BERT-base + Pseudo-data (single model)
|
Overall F1
|
27.100
|
# 5
|
|
Multi-Choice MRC
|
ExpMRC - RACE+ (test)
|
Human Performance
|
Answer F1
|
93.6
|
# 1
|
|
Multi-Choice MRC
|
ExpMRC - RACE+ (test)
|
Human Performance
|
Evidence F1
|
90.5
|
# 1
|
|
Multi-Choice MRC
|
ExpMRC - RACE+ (test)
|
Human Performance
|
Overall F1
|
84.4
|
# 1
|
|
Multi-Choice MRC
|
ExpMRC - RACE+ (test)
|
BERT-large + Pseudo-data (single model)
|
Answer F1
|
70.400
|
# 2
|
|
Multi-Choice MRC
|
ExpMRC - RACE+ (test)
|
BERT-large + Pseudo-data (single model)
|
Evidence F1
|
41.300
|
# 5
|
|
Multi-Choice MRC
|
ExpMRC - RACE+ (test)
|
BERT-large + Pseudo-data (single model)
|
Overall F1
|
30.800
|
# 3
|
|
Multi-Choice MRC
|
ExpMRC - RACE+ (test)
|
BERT-large + MSS w/ Ques. (single model)
|
Answer F1
|
68.100
|
# 3
|
|
Multi-Choice MRC
|
ExpMRC - RACE+ (test)
|
BERT-large + MSS w/ Ques. (single model)
|
Evidence F1
|
42.500
|
# 3
|
|
Multi-Choice MRC
|
ExpMRC - RACE+ (test)
|
BERT-large + MSS w/ Ques. (single model)
|
Overall F1
|
31.300
|
# 2
|
|
Span-Extraction MRC
|
ExpMRC - SQuAD (test)
|
BERT-large + MSS (single model)
|
Answer F1
|
92.300
|
# 1
|
|
Span-Extraction MRC
|
ExpMRC - SQuAD (test)
|
BERT-large + MSS (single model)
|
Evidence F1
|
85.700
|
# 3
|
|
Span-Extraction MRC
|
ExpMRC - SQuAD (test)
|
BERT-large + MSS (single model)
|
Overall F1
|
80.400
|
# 3
|
|
Span-Extraction MRC
|
ExpMRC - SQuAD (test)
|
BERT-large + PA Sent. (single model)
|
Answer F1
|
92.300
|
# 1
|
|
Span-Extraction MRC
|
ExpMRC - SQuAD (test)
|
BERT-large + PA Sent. (single model)
|
Evidence F1
|
89.600
|
# 1
|
|
Span-Extraction MRC
|
ExpMRC - SQuAD (test)
|
BERT-large + PA Sent. (single model)
|
Overall F1
|
83.600
|
# 2
|
|
Span-Extraction MRC
|
ExpMRC - SQuAD (test)
|
Human Performance
|
Answer F1
|
91.3
|
# 3
|
|
Span-Extraction MRC
|
ExpMRC - SQuAD (test)
|
Human Performance
|
Overall F1
|
84.7
|
# 1
|
|
Span-Extraction MRC
|
ExpMRC - SQuAD (test)
|
BERT-base + PA Sent. (single model)
|
Answer F1
|
87.100
|
# 4
|
|
Span-Extraction MRC
|
ExpMRC - SQuAD (test)
|
BERT-base + PA Sent. (single model)
|
Evidence F1
|
89.100
|
# 2
|
|
Span-Extraction MRC
|
ExpMRC - SQuAD (test)
|
BERT-base + PA Sent. (single model)
|
Overall F1
|
79.600
|
# 4
|
|
Span-Extraction MRC
|
ExpMRC - SQuAD (test)
|
BERT-base + MSS (single model)
|
Answer F1
|
87.100
|
# 4
|
|
Span-Extraction MRC
|
ExpMRC - SQuAD (test)
|
BERT-base + MSS (single model)
|
Evidence F1
|
85.400
|
# 4
|
|
Span-Extraction MRC
|
ExpMRC - SQuAD (test)
|
BERT-base + MSS (single model)
|
Overall F1
|
76.100
|
# 5
|
|