Lemmatization

61 papers with code • 0 benchmarks • 3 datasets

Lemmatization is a process of determining a base or dictionary form (lemma) for a given surface form. Especially for languages with rich morphology it is important to be able to normalize words into their base forms to better support for example search engines and linguistic studies. Main difficulties in Lemmatization arise from encountering previously unseen words during inference time as well as disambiguating ambiguous surface forms which can be inflected variants of several different base forms depending on the context.

Source: Universal Lemmatizer: A Sequence to Sequence Model for Lemmatizing Universal Dependencies Treebanks

Benchmarks

Add a Result

These leaderboards are used to track progress in Lemmatization

No evaluation results yet. Help compare methods by submitting evaluation metrics.

Libraries

Use these libraries to find Lemmatization models and implementations

huspacy/huspacy

3 papers

147

Datasets

Latest papers with no code

Most implemented Social Latest No code

H2-Golden-Retriever: Methodology and Tool for an Evidence-Based Hydrogen Research Grantsmanship

no code yet • 16 Nov 2022

The Knowledge Graph module was used for the generation of meaningful entities and their relationships, trends and patterns in relevant H2 papers, thanks to an ontology of the hydrogen production domain.

Paper
Add Code

Development of a rule-based lemmatization algorithm through Finite State Machine for Uzbek language

no code yet • 28 Oct 2022

This lemmatization consists of the general rules and a part of speech data of the Uzbek language, affixes, classification of affixes, removing affixes on the basis of the finite state machine for each class, as well as a definition of this word lemma.

Paper
Add Code

Arabic Word-level Readability Visualization for Assisted Text Simplification

no code yet • 19 Oct 2022

This demo paper presents a Google Docs add-on for automatic Arabic word-level readability visualization.

Paper
Add Code

Social Media Personal Event Notifier Using NLP and Machine Learning

no code yet • 10 Oct 2022

Social media apps have become very promising and omnipresent in daily life.

Paper
Add Code

Context based lemmatizer for Polish language

no code yet • 23 Jul 2022

In computational linguistics, lemmatisation is the algorithmic process of determining the lemma of a word based on its intended meaning.

Paper
Add Code

TArC: Tunisian Arabish Corpus First complete release

no code yet • 11 Jul 2022

In this paper we present the final result of a project on Tunisian Arabic encoded in Arabizi, the Latin-based writing system for digital conversations.

Paper
Add Code

The 2021 Urdu Fake News Detection Task using Supervised Machine Learning and Feature Combinations

no code yet • 6 Apr 2022

Our submitted results ranked fifth in the competition.

Paper
Add Code

Abusive and Threatening Language Detection in Urdu using Supervised Machine Learning and Feature Combinations

no code yet • 6 Apr 2022

This paper reports a non-exhaustive list of experiments that allowed us to reach the submitted results.

Paper
Add Code

Supervised and Unsupervised Categorization of an Imbalanced Italian Crime News Dataset

no code yet • Lecture Notes in Business Information Processing 2022

The scope of this paper is to explore the use of word embeddings for Italian crime news text categorization.

Paper
Add Code

POS tagging, lemmatization and dependency parsing of West Frisian

no code yet • LREC 2022

POS tags were assigned to words by using a Dutch POS tagger that was applied to a literal word-by-word translation, or to sentences of a Dutch parallel text.

Paper
Add Code

Lemmatization

Benchmarks Add a Result

Libraries

Datasets

Latest papers with no code

Content

Benchmarks

Add a Result