1 code implementation • NAACL 2021 • Dongkuan Xu, Ian E. H. Yen, Jinxi Zhao, Zhibin Xiao
In particular, common wisdom in pruning CNN states that sparse pruning technique compresses a model more than that obtained by reducing number of channels and layers (Elsen et al., 2020; Zhu and Gupta, 2017), while existing works on sparse pruning of BERT yields inferior results than its small-dense counterparts such as TinyBERT (Jiao et al., 2020).