Finite State Machine Pattern-Root Arabic Morphological Generator, Analyzer and Diacritizer

LREC 2020 · Maha Alkhairy, Afshan Jafri, David Smith ·

We describe and evaluate the Finite-State Arabic Morphologizer (FSAM) {--} a concatenative (prefix-stem-suffix) and templatic (root- pattern) morphologizer that generates and analyzes undiacritized Modern Standard Arabic (MSA) words, and diacritizes them. Our bidirectional unified-architecture finite state machine (FSM) is based on morphotactic MSA grammatical rules. The FSM models the root-pattern structure related to semantics and syntax, making it readily scalable unlike stem-tabulations in prevailing systems. We evaluate the coverage and accuracy of our model, with coverage being percentage of words in Tashkeela (a large corpus) that can be analyzed. Accuracy is computed against a gold standard, comprising words and properties, created from the intersection of UD PADT treebank and Tashkeela. Coverage of analysis (extraction of root and properties from word) is 82{\%}. Accuracy results are: root computed from a word (92{\%}), word generation from a root (100{\%}), non-root properties of a word (97{\%}), and diacritization (84{\%}). FSAM{'}s non-root results match or surpass MADAMIRA{'}s, and root result comparisons are not made because of the concatenative nature of publicly available morphologizers.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Datasets

Add Datasets introduced or used in this paper

Results from the Paper

Add Remove

Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Finite State Machine Pattern-Root Arabic Morphological Generator, Analyzer and Diacritizer

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove