QurAna: Corpus of the Quran annotated with Pronominal Anaphora
This paper presents QurAna: a large corpus created from the original Quranic text, where personal pronouns are tagged with their antecedence. These antecedents are maintained as an ontological list of concepts, which have proved helpful for information retrieval tasks. QurAna is characterized by: (a) comparatively large number of pronouns tagged with antecedent information (over 24,500 pronouns), and (b) maintenance of an ontological concept list out of these antecedents. We have shown useful applications of this corpus. This corpus is first of its kind considering classical Arabic text, which could be used for interesting applications for Modern Standard Arabic as well. This corpus would benefit researchers in obtaining empirical and rules in building new anaphora resolution approaches. Also, such corpus would be used to train, optimize and evaluate existing approaches.
PDF Abstract