2 code implementations • 21 Apr 2020 • Tanja Bunk, Daksh Varshneya, Vladimir Vlasov, Alan Nichol
Large-scale pre-trained language models have shown impressive results on language understanding benchmarks like GLUE and SuperGLUE, improving considerably over other pre-training methods like distributed representations (GloVe) and purely supervised approaches.