no code implementations • 12 Oct 2020 • Mingzhi Zheng, Dinghan Shen, Yelong Shen, Weizhu Chen, Lin Xiao
We prove, from a theoretical perspective, that the gradients derived from this new masking schema have a smaller variance and can lead to more efficient self-supervised training.
Ranked #1 on Sentence Classification on ACL-ARC
2 code implementations • 29 Sep 2020 • Dinghan Shen, Mingzhi Zheng, Yelong Shen, Yanru Qu, Weizhu Chen
Adversarial training has been shown effective at endowing the learned representations with stronger generalization ability.
Ranked #8 on Machine Translation on IWSLT2014 German-English