no code implementations • 26 Nov 2022 • Zixiang Ding, Guoqing Jiang, Shuai Zhang, Lin Guo, Wei Lin
In this paper, we propose Stochastic Knowledge Distillation (SKD) to obtain compact BERT-style language model dubbed SKDBERT.
no code implementations • ICLR 2020 • Jinlong Liu, Guoqing Jiang, Yunzhi Bai, Ting Chen, Huayan Wang
As deep neural networks (DNNs) achieve tremendous success across many application domains, researchers tried to explore in many aspects on why they generalize well.