no code implementations • CLTW (LREC) 2022 • Mahmoud El-Haj, Ignatius Ezeani, Jonathan Morris, Dawn Knight
As part of the effort to increase the availability of Welsh digital technology, this paper introduces the first human vs metrics Welsh summarisation evaluation results and dataset, which we provide freely for research purposes to help advance the work on Welsh summarisation.
no code implementations • LEGAL (LREC) 2022 • Jeremie Clos, Emma McClaughlin, Pepita Barnard, Elena Nichele, Dawn Knight, Derek McAuley, Svenja Adolphs
The days of large amorphous corpora collected with armies of Web crawlers and stored indefinitely are, or should be, coming to an end.
1 code implementation • LREC 2022 • Ignatius Ezeani, Mahmoud El-Haj, Jonathan Morris, Dawn Knight
Welsh is an official language in Wales and is spoken by an estimated 884, 300 people (29. 2% of the population of Wales).
no code implementations • 12 Oct 2020 • Dawn Knight, Steve Morris, Tess Fitzpatrick, Paul Rayson, Irena Spasić, Enlli Môn Thomas
This report provides an overview of the CorCenCC project and the online corpus resource that was developed as a result of work on the project.
no code implementations • WS 2019 • Ignatius Ezeani, Scott Piao, Steven Neale, Paul Rayson, Dawn Knight
While the application of word embedding models to downstream Natural Language Processing (NLP) tasks has been shown to be successful, the benefits for low-resource languages is somewhat limited due to lack of adequate data for training the models.
1 code implementation • LREC 2016 • Scott Piao, Paul Rayson, Dawn Archer, Francesca Bianchi, Carmen Dayrell, Mahmoud El-Haj, Ricardo-Mar{\'\i}a Jim{\'e}nez, Dawn Knight, Michal K{\v{r}}en, Laura L{\"o}fberg, Rao Muhammad Adeel Nawab, Jawad Shafi, Phoey Lee Teh, Olga Mudraya
Lexical coverage is an important factor concerning the quality of the lexicons and the performance of the corpus annotation tools, and in this experiment we focus on evaluating the lexical coverage achieved by the multilingual lexicons and semantic annotation tools based on them.