701 papers with code • 32 benchmarks • 60 datasets
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers.
Ranked #1 on Question Answering on CoQA
Then, instead of training a model that predicts the original identities of the corrupted tokens, we train a discriminative model that predicts whether each token in the corrupted input was replaced by a generator sample or not.
Ranked #7 on Question Answering on Quora Question Pairs
Adversarial training provides a means of regularizing supervised learning algorithms while virtual adversarial training is able to extend supervised learning algorithms to the semi-supervised setting.
Ranked #13 on Sentiment Analysis on IMDb
Convolutional neural network (CNN) is a neural network that can make use of the internal structure of data such as the 2D structure of image data.
Ranked #18 on Sentiment Analysis on IMDb
Recent progress in pre-trained neural language models has significantly improved the performance of many natural language processing (NLP) tasks.
Ranked #1 on Named Entity Recognition on CoNLL 2003 NER dev
As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large models in on-the-edge and/or under constrained computational training or inference budgets remains challenging.
Ranked #9 on Semantic Textual Similarity on MRPC
With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling.
Ranked #1 on Sentiment Analysis on Yelp Binary classification
Recent developments in natural language representations have been accompanied by large and expensive models that leverage vast amounts of general-domain text through self-supervised pre-training.