Assess whether a sentence is grammatical or ungrammatical.
|TREND||DATASET||BEST METHOD||PAPER TITLE||PAPER||CODE||COMPARE|
Large pre-trained language models (LMs) have demonstrated remarkable ability as few-shot learners.
Ranked #1 on Paraphrase Identification on Quora Question Pairs
Pre-trained language models have proven their unique powers in capturing implicit language features.
Ranked #5 on Question Answering on Quora Question Pairs
We show that only a fourth of the final layers need to be fine-tuned to achieve 90% of the original quality.
Knowledge distillation can effectively transfer knowledge from BERT, a deep language representation model, to traditional, shallow word embedding-based neural networks, helping them approach or exceed the quality of other heavyweight language representation models.
Recently, the pre-trained language model, BERT (and its robustly optimized version RoBERTa), has attracted a lot of attention in natural language understanding (NLU), and achieved state-of-the-art accuracy in various NLU tasks, such as sentiment classification, natural language inference, semantic textual similarity and question answering.
Ranked #1 on Paraphrase Identification on WikiHop
We use this analysis set to investigate the grammatical knowledge of three pretrained encoders: BERT (Devlin et al., 2018), GPT (Radford et al., 2018), and the BiLSTM baseline from Warstadt et al. We find that these models have a strong command of complex or non-canonical argument structures like ditransitives (Sue gave Dan a book) and passives (The book was read).
For these models, we propose an extension that simulates a full rating distribution (instead of average ratings) and allows generating individual ratings.