no code implementations • WS 2019 • Majdi Sawalha, Faisal Al-Shargi, Abdallah AlShdaifat, Sane Yagi, Mohammad A. Qudah
To compile a modern dictionary that catalogues the words in currency, and to study linguistic patterns in the contemporary language, it is necessary to have a corpus of authentic texts that reflect current usage of the language.
no code implementations • LREC 2014 • Claire Brierley, Majdi Sawalha, Eric Atwell
In this paper, we focus on the prosodic effect of qalqalah or {``}vibration{''} applied to a subset of Arabic consonants under certain constraints during correct Qur{'}anic recitation or ta{\c{C}}{\S}w{\=\i}d, using our Boundary-Annotated QurÂ’an dataset of 77430 words (Brierley et al 2012; Sawalha et al 2014).
no code implementations • LREC 2012 • Majdi Sawalha, Claire Brierley, Eric Atwell
We train and test two probabilistic taggers for Arabic phrase break prediction on a purpose-built, gold standard, boundary-annotated and PoS-tagged Qur'an corpus of 77430 words and 8230 sentences.
no code implementations • LREC 2012 • Claire Brierley, Majdi Sawalha, Eric Atwell
We take a novel approach to phrase break prediction for Arabic, deriving our prosodic annotation scheme from Tajw{\=\i}d (recitation) mark-up in the Qur'an which we then interpret as additional text-based data for computational analysis.