2 code implementations • ICML 2020 • Aditya Kanade, Petros Maniatis, Gogul Balakrishnan, Kensen Shi
We fine-tune CuBERT on our benchmark tasks, and compare the resulting models to different variants of Word2Vec token embeddings, BiLSTM and Transformer models, as well as published state-of-the-art models, showing that CuBERT outperforms them all, even with shorter training, and with fewer labeled examples.
no code implementations • 25 Sep 2019 • Aditya Kanade, Petros Maniatis, Gogul Balakrishnan, Kensen Shi
A major advancement in natural-language understanding has been the use of pre-trained token embeddings; BERT and other works have further shown that pre-trained contextual embeddings can be extremely powerful and can be finetuned effectively for a variety of downstream supervised tasks.