TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Natural Language Inference	MultiNLI	MT-DNN-ensemble	Matched	87.9	# 17
Natural Language Inference	MultiNLI	MT-DNN-ensemble	Mismatched	87.4	# 11
Semantic Textual Similarity	SentEval	MT-DNN-ensemble	MRPC	92.7/90.3	# 1
Semantic Textual Similarity	SentEval	MT-DNN-ensemble	SICK-R	-	# 3
Semantic Textual Similarity	SentEval	MT-DNN-ensemble	SICK-E	-	# 3
Semantic Textual Similarity	SentEval	MT-DNN-ensemble	STS	91.1/90.7*	# 1
Sentiment Analysis	SST-2 Binary classification	MT-DNN-ensemble	Accuracy	96.5	# 14

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/improving-multi-task-deep-neural-networks-via/semantic-textual-similarity-on-senteval)](https://paperswithcode.com/sota/semantic-textual-similarity-on-senteval?p=improving-multi-task-deep-neural-networks-via)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/improving-multi-task-deep-neural-networks-via/sentiment-analysis-on-sst-2-binary)](https://paperswithcode.com/sota/sentiment-analysis-on-sst-2-binary?p=improving-multi-task-deep-neural-networks-via)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/improving-multi-task-deep-neural-networks-via/natural-language-inference-on-multinli)](https://paperswithcode.com/sota/natural-language-inference-on-multinli?p=improving-multi-task-deep-neural-networks-via)`

Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding

20 Apr 2019 · Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao ·

This paper explores the use of knowledge distillation to improve a Multi-Task Deep Neural Network (MT-DNN) (Liu et al., 2019) for learning text representations across multiple natural language understanding tasks. Although ensemble learning can improve model performance, serving an ensemble of large DNNs such as MT-DNN can be prohibitively expensive. Here we apply the knowledge distillation method (Hinton et al., 2015) in the multi-task learning setting. For each task, we train an ensemble of different MT-DNNs (teacher) that outperforms any single model, and then train a single MT-DNN (student) via multi-task learning to \emph{distill} knowledge from these ensemble teachers. We show that the distilled MT-DNN significantly outperforms the original MT-DNN on 7 out of 9 GLUE tasks, pushing the GLUE benchmark (single model) to 83.7\% (1.5\% absolute improvement\footnote{ Based on the GLUE leaderboard at https://gluebenchmark.com/leaderboard as of April 1, 2019.}). The code and pre-trained models will be made publicly available at https://github.com/namisan/mt-dnn.

PDF Abstract

Code

Add Remove Mark official

namisan/mt-dnn official

2,198

microsoft/MT-DNN

153

chunhuililili/mt_dnn

Tasks

Add Remove

Ensemble Learning

Knowledge Distillation

Multi-Task Learning

Natural Language Inference

Natural Language Understanding

Semantic Textual Similarity

Sentiment Analysis

Datasets

GLUE

SST

MultiNLI SST-2

QNLI

SentEval

Results from the Paper

Edit

Ranked #1 on Semantic Textual Similarity on SentEval

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Natural Language Inference	MultiNLI	MT-DNN-ensemble	Matched	87.9	# 17	Compare
Natural Language Inference	MultiNLI	MT-DNN-ensemble	Mismatched	87.4	# 11	Compare
Semantic Textual Similarity	SentEval	MT-DNN-ensemble	MRPC	92.7/90.3	# 1	Compare
			SICK-R	-	# 3	Compare
			SICK-E	-	# 3	Compare
			STS	91.1/90.7*	# 1	Compare
Sentiment Analysis	SST-2 Binary classification	MT-DNN-ensemble	Accuracy	96.5	# 14	Compare

Methods

Add Remove

Knowledge Distillation

Edit Social Preview

Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove