Sequence-Level Knowledge Distillation

EMNLP 2016 Yoon KimAlexander M. Rush

Neural machine translation (NMT) offers a novel alternative formulation of translation that is potentially simpler than statistical approaches. However to reach competitive performance, NMT models need to be exceedingly large...

Evaluation results from the paper

Task Dataset Model Metric name Metric value Global rank Compare
Machine Translation IWSLT2015 Thai-English Seq-KD + Seq-Inter + Word-KD BLEU score 14.2 # 1
Machine Translation WMT2014 English-German Seq-KD + Seq-Inter + Word-KD BLEU score 18.5 # 19