We perform a systematic set of experiments using two neural constituency parsers to examine how different parsers behave in combination with different BERT models with varying source and target genres in English and Swedish.
In order to determine towhat degree the data imbalance between two domains and the domain differences affect results, we also carry out an experiment with two imbalanced in-domain treebanks and show that loss weighting also improves performance in an in-domain setting.
Based on our findings, we can conclude that a multilingual optimization of classifiers is not possible even in settings where comparable data sets are used.
Parsing Chinese critically depends on correct word segmentation for the parser since incorrect segmentation inevitably causes incorrect parses.
We investigate parsing replicability across 7 languages (and 8 treebanks), showing that choices concerning the use of grammatical functions in parsing or evaluation, the influence of the rare word threshold, as well as choices in test sentences and evaluation script options have considerable and often unexpected effects on parsing accuracies.