TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Arabic Text Diacritization	Tashkeela	MC	Diacritic Error Rate	0.0339	# 5
Arabic Text Diacritization	Tashkeela	MC	Word Error Rate (WER)	0.0994	# 5

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/multi-components-system-for-automatic-arabic/arabic-text-diacritization-on-tashkeela-1)](https://paperswithcode.com/sota/arabic-text-diacritization-on-tashkeela-1?p=multi-components-system-for-automatic-arabic)`

Multi-components System for Automatic Arabic Diacritization

8 Apr 2020 · Hamza Abbad, Shengwu Xiong ·

In this paper, we propose an approach to tackle the problem of the automatic restoration of Arabic diacritics that includes three components stacked in a pipeline: a deep learning model which is a multi-layer recurrent neural network with LSTM and Dense layers, a character-level rule-based corrector which applies deterministic operations to prevent some errors, and a word-level statistical corrector which uses the context and the distance information to fix some diacritization issues. This approach is novel in a way that combines methods of different types and adds edit distance based corrections. We used a large public dataset containing raw diacritized Arabic text (Tashkeela) for training and testing our system after cleaning and normalizing it. On a newly-released benchmark test set, our system outperformed all the tested systems by achieving DER of 3.39% and WER of 9.94% when taking all Arabic letters into account, DER of 2.61% and WER of 5.83% when ignoring the diacritization of the last letter of every word.

PDF Abstract