In the context of Sentence Simplification, this is particularly challenging: the task requires by nature to replace complex words with simpler ones that shares the same meaning.
no code implementations • 22 Mar 2021 • Isaac Caswell, Julia Kreutzer, Lisa Wang, Ahsan Wahab, Daan van Esch, Nasanbayar Ulzii-Orshikh, Allahsera Tapo, Nishant Subramani, Artem Sokolov, Claytone Sikasote, Monang Setyawan, Supheakmungkol Sarin, Sokhar Samb, Benoît Sagot, Clara Rivera, Annette Rios, Isabel Papadimitriou, Salomey Osei, Pedro Javier Ortiz Suárez, Iroro Orife, Kelechi Ogueji, Rubungo Andre Niyongabo, Toan Q. Nguyen, Mathias Müller, André Müller, Shamsuddeen Hassan Muhammad, Nanda Muhammad, Ayanda Mnyakeni, Jamshidbek Mirzakhalov, Tapiwanashe Matangira, Colin Leong, Nze Lawson, Sneha Kudugunta, Yacine Jernite, Mathias Jenny, Orhan Firat, Bonaventure F. P. Dossou, Sakhile Dlamini, Nisansa de Silva, Sakine Çabuk Ballı, Stella Biderman, Alessia Battisti, Ahmed Baruwa, Ankur Bapna, Pallavi Baljekar, Israel Abebe Azime, Ayodele Awokoya, Duygu Ataman, Orevaoghene Ahia, Oghenefego Ahia, Sweta Agrawal, Mofetoluwa Adeyemi
With the success of large-scale pre-training and multilingual modeling in Natural Language Processing (NLP), recent years have seen a proliferation of large, web-mined text datasets covering hundreds of languages.
Such transfer emerges by fine-tuning on a task of interest in one language and evaluating on a distinct language, not seen during the fine-tuning.
Focusing on the latter, we show that this failure to transfer is largely related to the impact of the script used to write such languages.
Coupled with the availability of large scale datasets, deep learning architectures have enabled rapid progress on the Question Answering task.
They actually equal or improve the current state of the art in tagging and parsing for all five languages.
The French TreeBank developed at the University Paris 7 is the main source of morphosyntactic and syntactic annotations for French.
Progress in sentence simplification has been hindered by a lack of labeled parallel simplification data, particularly in languages other than English.
Ranked #1 on Text Simplification on ASSET
Furthermore, we motivate the need for developing better methods for automatic evaluation using ASSET, since we show that current popular metrics may not be suitable when multiple simplification transformations are performed.
Neural language models trained with a predictive or masked objective have proven successful at capturing short and long distance syntactic dependencies.
We show that the use of web crawled data is preferable to the use of Wikipedia data.
Ranked #1 on Dependency Parsing on French GSD
Text simplification aims at making a text easier to read and understand by simplifying grammar and structure while keeping the underlying information identical.
Ranked #2 on Text Simplification on ASSET
We show that n-gram-based MT metrics such as BLEU and METEOR correlate the most with human judgment of grammaticality and meaning preservation, whereas simplicity is best evaluated by basic length-based metrics.
Morphosyntactic lexicons and word vector representations have both proven useful for improving the accuracy of statistical part-of-speech taggers.