Lemmatization

41 papers with code • 0 benchmarks • 0 datasets

Lemmatization is a process of determining a base or dictionary form (lemma) for a given surface form. Especially for languages with rich morphology it is important to be able to normalize words into their base forms to better support for example search engines and linguistic studies. Main difficulties in Lemmatization arise from encountering previously unseen words during inference time as well as disambiguating ambiguous surface forms which can be inflected variants of several different base forms depending on the context.

Source: Universal Lemmatizer: A Sequence to Sequence Model for Lemmatizing Universal Dependencies Treebanks

Datasets


Greatest papers with code

Stanza: A Python Natural Language Processing Toolkit for Many Human Languages

stanfordnlp/stanza ACL 2020

We introduce Stanza, an open-source Python natural language processing toolkit supporting 66 human languages.

Coreference Resolution Dependency Parsing +4

Top2Vec: Distributed Representations of Topics

ddangelov/Top2Vec 19 Aug 2020

Distributed representations of documents and words have gained popularity due to their ability to capture semantics of words and documents.

Lemmatization Semantic Similarity +2

NLP-Cube: End-to-End Raw Text Processing With Neural Networks

adobe/NLP-Cube CONLL 2018

We introduce NLP-Cube: an end-to-end Natural Language Processing framework, evaluated in CoNLL{'}s {``}Multilingual Parsing from Raw Text to Universal Dependencies 2018{''} Shared Task.

Lemmatization Tokenization

Cross-Lingual Lemmatization and Morphology Tagging with Two-Stage Multilingual BERT Fine-Tuning

hyperparticle/udify WS 2019

We present our CHARLES-SAARLAND system for the SIGMORPHON 2019 Shared Task on Crosslinguality and Context in Morphology, in task 2, Morphological Analysis and Lemmatization in Context.

Lemmatization Morphological Analysis

Neural Morphology Dataset and Models for Multiple Languages, from the Large to the Endangered

mikahama/uralicNLP 26 May 2021

We train neural models for morphological analysis, generation and lemmatization for morphologically rich languages.

Lemmatization Morphological Analysis

Revisiting NMT for Normalization of Early English Letters

mikahama/natas WS 2019

This paper studies the use of NMT (neural machine translation) as a normalization method for an early English letter corpus.

Lemmatization Machine Translation