1 code implementation • 25 Mar 2024 • Olia Toporkov, Rodrigo Agerri
We experiment with seven languages of different morphological complexity, namely, English, Spanish, Basque, Russian, Czech, Turkish and Polish, using multilingual and language-specific pre-trained masked language encoder-only models as a backbone to build our lemmatizers.
no code implementations • 1 Feb 2023 • Olia Toporkov, Rodrigo Agerri
Given that the process to obtain a lemma from an inflected word can be explained by looking at its morphosyntactic category, including fine-grained morphosyntactic information to train contextual lemmatizers has become common practice, without considering whether that is the optimum in terms of downstream performance.