1 code implementation • 28 Nov 2021 • Akihiro Nakano, Shi Chen, Kazuyuki Demachi
We theoretically prove that both losses help the model learn more efficiently and that cross-task consistency loss is better in terms of alignment with the straight-forward predictions.