To verify this assumption, we introduce a novel method, XPASC (eXPlainability-Association SCore), for measuring the generalization of a model trained with a weakly supervised dataset.
We circumvent the need for large amounts of labeled data by using unlabeled data for pretraining a language model.
The knowledge is captured in labeling functions, which detect certain regularities or patterns in the training samples and annotate corresponding labels for training.
We propose new methods for in-domain and cross-domain Named Entity Recognition (NER) on historical data for Dutch and French.
We propose an architecture that trains an out-of-domain model on a large newswire corpus, and transfers those weights by using them as a prior for a model trained on the target domain (a data-set of German Tweets) for which there is very little an-notations available.