TASK |
DATASET |
MODEL |
METRIC NAME |
METRIC VALUE |
GLOBAL RANK |
REMOVE |
Natural Language Inference
|
AX
|
T5
|
Accuracy
|
53.1
|
# 1
|
|
Natural Language Understanding
|
GLUE
|
MT-DNN-SMART
|
Average
|
89.9
|
# 1
|
|
Natural Language Inference
|
MNLI + SNLI + ANLI + FEVER
|
SMARTRoBERTa-LARGE
|
% Dev Accuracy
|
57.1
|
# 1
|
|
Natural Language Inference
|
MNLI + SNLI + ANLI + FEVER
|
SMARTRoBERTa-LARGE
|
% Test Accuracy
|
57.1
|
# 1
|
|
Semantic Textual Similarity
|
MRPC
|
SMART
|
Accuracy
|
91.3%
|
# 6
|
|
Semantic Textual Similarity
|
MRPC
|
SMARTRoBERTa
|
Dev F1
|
92.1
|
# 1
|
|
Semantic Textual Similarity
|
MRPC
|
SMARTRoBERTa
|
Dev Accuracy
|
89.2
|
# 1
|
|
Semantic Textual Similarity
|
MRPC
|
SMART-BERT
|
Dev F1
|
91.3
|
# 2
|
|
Semantic Textual Similarity
|
MRPC
|
SMART-BERT
|
Dev Accuracy
|
87.7
|
# 2
|
|
Semantic Textual Similarity
|
MRPC
|
MT-DNN-SMART
|
Accuracy
|
93.7%
|
# 1
|
|
Semantic Textual Similarity
|
MRPC
|
MT-DNN-SMART
|
F1
|
91.7
|
# 5
|
|
Natural Language Inference
|
MultiNLI
|
SMARTRoBERTa
|
Dev Matched
|
91.1
|
# 1
|
|
Natural Language Inference
|
MultiNLI
|
SMARTRoBERTa
|
Dev Mismatched
|
91.3
|
# 1
|
|
Natural Language Inference
|
MultiNLI
|
MT-DNN-SMART
|
Accuracy
|
85.7
|
# 1
|
|
Natural Language Inference
|
MultiNLI
|
SMART-BERT
|
Dev Matched
|
85.6
|
# 2
|
|
Natural Language Inference
|
MultiNLI
|
SMART-BERT
|
Dev Mismatched
|
86.0
|
# 2
|
|
Natural Language Inference
|
MultiNLI
|
T5
|
Matched
|
92.0
|
# 1
|
|
Natural Language Inference
|
MultiNLI
|
T5
|
Mismatched
|
91.7
|
# 1
|
|
Natural Language Inference
|
MultiNLI
|
SMART+BERT-BASE
|
Accuracy
|
85.6
|
# 3
|
|
Natural Language Inference
|
MultiNLI
|
MT-DNN-SMARTv0
|
Accuracy
|
85.7
|
# 1
|
|
Natural Language Inference
|
QNLI
|
MT-DNN-SMART
|
Accuracy
|
99.2%
|
# 1
|
|
Natural Language Inference
|
QNLI
|
ALICE
|
Accuracy
|
99.2%
|
# 1
|
|
Natural Language Inference
|
QNLI
|
SMART-BERT
|
Dev Accuracy
|
91.7
|
# 2
|
|
Natural Language Inference
|
QNLI
|
SMARTRoBERTa
|
Dev Accuracy
|
95.6
|
# 1
|
|
Paraphrase Identification
|
Quora Question Pairs
|
ALICE
|
F1
|
90.7
|
# 1
|
|
Paraphrase Identification
|
Quora Question Pairs
|
SMART-BERT
|
Dev Accuracy
|
91.5
|
# 2
|
|
Paraphrase Identification
|
Quora Question Pairs
|
SMART-BERT
|
Dev F1
|
88.5
|
# 1
|
|
Paraphrase Identification
|
Quora Question Pairs
|
FreeLB
|
Accuracy
|
74.8
|
# 20
|
|
Paraphrase Identification
|
Quora Question Pairs
|
FreeLB
|
Dev Accuracy
|
92.6
|
# 1
|
|
Natural Language Inference
|
RTE
|
SMART
|
Accuracy
|
71.2%
|
# 31
|
|
Natural Language Inference
|
RTE
|
T5
|
Accuracy
|
92.5%
|
# 5
|
|
Natural Language Inference
|
RTE
|
SMART-BERT
|
Dev Accuracy
|
71.2
|
# 2
|
|
Natural Language Inference
|
RTE
|
SMARTRoBERTa
|
Dev Accuracy
|
92.0
|
# 1
|
|
Natural Language Inference
|
SciTail
|
MT-DNN-SMARTLARGEv0
|
% Dev Accuracy
|
96.6
|
# 1
|
|
Natural Language Inference
|
SciTail
|
MT-DNN-SMARTLARGEv0
|
% Test Accuracy
|
95.2
|
# 1
|
|
Natural Language Inference
|
SciTail
|
MT-DNN-SMART_100%ofTrainingData
|
Dev Accuracy
|
96.1
|
# 1
|
|
Natural Language Inference
|
SciTail
|
MT-DNN-SMART_0.1%ofTrainingData
|
Dev Accuracy
|
82.3
|
# 4
|
|
Natural Language Inference
|
SciTail
|
MT-DNN-SMART_1%ofTrainingData
|
Dev Accuracy
|
88.6
|
# 3
|
|
Natural Language Inference
|
SciTail
|
MT-DNN-SMART_10%ofTrainingData
|
Dev Accuracy
|
91.3
|
# 2
|
|
Natural Language Inference
|
SNLI
|
MT-DNN-SMART_1%ofTrainingData
|
Dev Accuracy
|
86
|
# 3
|
|
Natural Language Inference
|
SNLI
|
MT-DNN-SMART_100%ofTrainingData
|
Dev Accuracy
|
91.6
|
# 1
|
|
Natural Language Inference
|
SNLI
|
MT-DNN-SMARTLARGEv0
|
% Test Accuracy
|
91.7
|
# 7
|
|
Natural Language Inference
|
SNLI
|
MT-DNN-SMARTLARGEv0
|
% Dev Accuracy
|
92.6
|
# 1
|
|
Natural Language Inference
|
SNLI
|
MT-DNN-SMART_10%ofTrainingData
|
Dev Accuracy
|
88.7
|
# 2
|
|
Natural Language Inference
|
SNLI
|
MT-DNN-SMART_0.1%ofTrainingData
|
Dev Accuracy
|
82.7
|
# 4
|
|
Sentiment Analysis
|
SST-2 Binary classification
|
SMART-MT-DNN
|
Dev Accuracy
|
96.1
|
# 2
|
|
Sentiment Analysis
|
SST-2 Binary classification
|
MT-DNN
|
Accuracy
|
93.6
|
# 35
|
|
Sentiment Analysis
|
SST-2 Binary classification
|
SMART+BERT-BASE
|
Accuracy
|
93
|
# 40
|
|
Sentiment Analysis
|
SST-2 Binary classification
|
SMARTRoBERTa
|
Dev Accuracy
|
96.9
|
# 1
|
|
Sentiment Analysis
|
SST-2 Binary classification
|
SMART-BERT
|
Dev Accuracy
|
93.0
|
# 3
|
|
Sentiment Analysis
|
SST-2 Binary classification
|
MT-DNN-SMART
|
Accuracy
|
97.5
|
# 1
|
|
Semantic Textual Similarity
|
STS Benchmark
|
MT-DNN-SMART
|
Pearson Correlation
|
0.929
|
# 1
|
|
Semantic Textual Similarity
|
STS Benchmark
|
MT-DNN-SMART
|
Spearman Correlation
|
0.925
|
# 2
|
|
Semantic Textual Similarity
|
STS Benchmark
|
SMARTRoBERTa
|
Dev Spearman Correlation
|
92.6
|
# 1
|
|
Semantic Textual Similarity
|
STS Benchmark
|
SMARTRoBERTa
|
Dev Pearson Correlation
|
92.8
|
# 1
|
|
Semantic Textual Similarity
|
STS Benchmark
|
SMART-BERT
|
Dev Spearman Correlation
|
89.4
|
# 2
|
|
Semantic Textual Similarity
|
STS Benchmark
|
SMART-BERT
|
Dev Pearson Correlation
|
90.0
|
# 2
|
|
Natural Language Inference
|
WNLI
|
T5
|
Accuracy
|
93.2%
|
# 2
|
|