However, it requires that both sentences are fed into the network, which causes a massive computational overhead: Finding the most similar pair in a collection of 10, 000 sentences requires about 50 million inference computations (~65 hours) with BERT.
Deep Neural Networks (DNN) have been widely employed in industry to address various Natural Language Processing (NLP) tasks.
Research in natural language processing proceeds, in part, by demonstrating that new models achieve superior performance (e. g., accuracy) on held-out test data, compared to previous results.
Existing works, including ELMO and BERT, have revealed the importance of pre-training for NLP tasks.
Obtaining large-scale annotated data for NLP tasks in the scientific domain is challenging and expensive.
SOTA for Sentence Classification on ScienceCite (using extra training data)
CITATION INTENT CLASSIFICATION DEPENDENCY PARSING LANGUAGE MODELLING MEDICAL NAMED ENTITY RECOGNITION PARTICIPANT INTERVENTION COMPARISON OUTCOME EXTRACTION RELATION EXTRACTION SENTENCE CLASSIFICATION
For abstractive summarization, we propose a new fine-tuning schedule which adopts different optimizers for the encoder and the decoder as a means of alleviating the mismatch between the two (the former is pretrained while the latter is not).
SOTA for Document Summarization on CNN / Daily Mail (using extra training data)