Here, we show that the robust overfitting shall be viewed as the early part of an epoch-wise double descent -- the robust test error will start to decrease again after training the model for a considerable number of epochs.
In this paper, we propose collaborative adversarial training to improve the data utilization, which coordinates virtual adversarial training (VAT) and adversarial training (AT) at different levels.
Multi-head attention plays a crucial role in the recent success of Transformer models, which leads to consistent performance improvements over conventional attention in various applications.
Training a conventional neural tagger based on silver labels usually faces the risk of overfitting phrase surface names.
Ranked #1 on Phrase Ranking on KP20k
Specifically, we first propose a strategy to measure the data quality based on the learning behaviors of the data during adversarial training and find that low-quality data may not be useful and even detrimental to the adversarial robustness.
Due to the excessive cost of large-scale language model pre-training, considerable efforts have been made to train BERT progressively -- start from an inferior but low-cost model and gradually grow the model to increase the computational complexity.
We explore the application of very deep Transformer models for Neural Machine Translation (NMT).
Ranked #1 on Machine Translation on WMT2014 English-French (using extra training data)
While typical named entity recognition (NER) models require the training set to be annotated with all target types, each available datasets may only cover a part of them.
Its performance is largely influenced by the annotation quality and quantity in supervised learning scenarios, and obtaining ground truth labels is often costly.
Therefore, we manually correct these label mistakes and form a cleaner test set.
Ranked #2 on Named Entity Recognition on CoNLL++ (using extra training data)
In this paper, we present a facet-aware evaluation setup for better assessment of the information coverage in extracted summaries.
Our model neither requires the conversion from character sequences to word sequences, nor assumes tokenizer can correctly detect all word boundaries.
The learning rate warmup heuristic achieves remarkable success in stabilizing training, accelerating convergence and improving generalization for adaptive stochastic optimization algorithms like RMSprop and Adam.
We design a set of word frequency-based reliability signals to indicate the quality of each word embedding.
In this paper, we study the problem what limits the performance of DS-trained neural models, conduct thorough analyses, and identify a factor that can influence the performance greatly, shifted label distribution.
Distant supervision leverages knowledge bases to automatically label instances, thus allowing us to train relation extractor without human annotations.
Recent advances in deep neural models allow us to build reliable named entity recognition (NER) systems without handcrafting features.
Many efforts have been made to facilitate natural language processing tasks with pre-trained language models (LMs), and brought significant improvements to various applications.
Ranked #35 on Named Entity Recognition on CoNLL 2003 (English)
We study the task of expert finding in heterogeneous bibliographical networks based on two aspects: textual content analysis and authority ranking.
Unlike most existing embedding methods that are task-agnostic, we simultaneously solve for the underlying node representations and the optimal clustering assignments in an end-to-end manner.
Social and Information Networks Physics and Society
In this study, we develop a novel neural framework to extract abundant knowledge hidden in raw texts to empower the sequence labeling task.
Ranked #12 on Part-Of-Speech Tagging on Penn Treebank
These annotations, referred as heterogeneous supervision, often conflict with each other, which brings a new challenge to the original relation extraction task: how to infer the true label from noisy labels for a given instance.