no code implementations • ACL 2020 • Alex Erdmann, er, Tom Kenter, Markus Becker, Christian Schallhart
Lexica distinguishing all morphologically related forms of each lexeme are crucial to many language technologies, yet building them is expensive.
1 code implementation • LREC 2020 • Ossama Obeid, Nasser Zalmout, Salam Khalifa, Dima Taji, Mai Oudah, Bashar Alhafni, Go Inoue, Fadhl Eryani, Alex Erdmann, er, Nizar Habash
We present CAMeL Tools, a collection of open-source tools for Arabic natural language processing in Python.
no code implementations • WS 2019 • Alex Erdmann, er, Salam Khalifa, Mai Oudah, Nizar Habash, Houda Bouamor
We present de-lexical segmentation, a linguistically motivated alternative to greedy or other unsupervised methods, requiring only minimal language specific input.
2 code implementations • NAACL 2019 • Alex Erdmann, er, David Joseph Wrisley, Benjamin Allen, Christopher Brown, Sophie Cohen-Bod{\'e}n{\`e}s, Micha Elsner, Yukun Feng, Brian Joseph, B{\'e}atrice Joyeux-Prunel, Marie-Catherine de Marneffe
Scholars in inter-disciplinary fields like the Digital Humanities are increasingly interested in semantic annotation of specialized corpora.
no code implementations • WS 2018 • Alex Erdmann, er, Nizar Habash
Morphologically rich languages are challenging for natural language processing tasks due to data sparsity.
no code implementations • ACL 2018 • Alex Erdmann, er, Nasser Zalmout, Nizar Habash
Arabic dialects lack large corpora and are noisy, being linguistically disparate with no standardized spelling.
no code implementations • NAACL 2018 • Nasser Zalmout, Alex Erdmann, er, Nizar Habash
User-generated text tends to be noisy with many lexical and orthographic inconsistencies, making natural language processing (NLP) tasks more challenging.
no code implementations • LREC 2018 • Nizar Habash, Fadhl Eryani, Salam Khalifa, Owen Rambow, Dana Abdulrahim, Alex Erdmann, er, Reem Faraj, Wajdi Zaghouani, Houda Bouamor, Nasser Zalmout, Sara Hassan, Faisal Al-Shargi, Sakhar Alkhereyf, Basma Abdulkareem, Esk, Ramy er, Mohammad Salameh, Hind Saddiki
no code implementations • WS 2016 • Alex Erdmann, er, Christopher Brown, Brian Joseph, Mark Janse, Petra Ajaka, Micha Elsner, Marie-Catherine de Marneffe
Although spanning thousands of years and genres as diverse as liturgy, historiography, lyric and other forms of prose and poetry, the body of Latin texts is still relatively sparse compared to English.