1 code implementation • LREC 2022 • Chiamaka Chukwuneke, Ignatius Ezeani, Paul Rayson, Mahmoud El-Haj
Our results show that, although the IgboNER task benefited hugely from fine-tuning large transformer model, fine-tuning a transformer model built from scratch with comparatively little Igbo text data seems to yield quite decent results for the IgboNER task.
no code implementations • OSACT (LREC) 2022 • Mahmoud El-Haj, Elvis de Souza, Nouran Khallaf, Paul Rayson, Nizar Habash
This paper presents (AraSAS) the first open-source Arabic semantic analysis tagging system.
1 code implementation • 18 Jul 2024 • Jamie Glen, Lifeng Han, Paul Rayson, Goran Nenadic
To further facilitate the possibility of automatic coding practice, we explore some solutions in a local computer setting; in addition, we explore the function of explainability for transparency of AI models.
no code implementations • 2 May 2024 • Chris Chinenye Emezue, Ifeoma Okoh, Chinedu Mbonu, Chiamaka Chukwuneke, Daisy Lal, Ignatius Ezeani, Paul Rayson, Ijemma Onwuzulike, Chukwuma Okeke, Gerald Nweya, Bright Ogbonna, Chukwuebuka Oraegbunam, Esther Chidinma Awo-Ndubuisi, Akudo Amarachukwu Osuagwu, Obioha Nmezi
The Igbo language is facing a risk of becoming endangered, as indicated by a 2025 UNESCO study.
1 code implementation • NAACL (CLPsych) 2021 • Glorianna Jagfeld, Fiona Lobban, Paul Rayson, Steven H. Jones
Recently, research on mental health conditions using public online data, including Reddit, has surged in NLP and health research but has not reported user characteristics, which are important to judge generalisability of findings.
2 code implementations • 22 Mar 2021 • David Ifeoluwa Adelani, Jade Abbott, Graham Neubig, Daniel D'souza, Julia Kreutzer, Constantine Lignos, Chester Palen-Michel, Happy Buzaaba, Shruti Rijhwani, Sebastian Ruder, Stephen Mayhew, Israel Abebe Azime, Shamsuddeen Muhammad, Chris Chinenye Emezue, Joyce Nakatumba-Nabende, Perez Ogayo, Anuoluwapo Aremu, Catherine Gitau, Derguene Mbaye, Jesujoba Alabi, Seid Muhie Yimam, Tajuddeen Gwadabe, Ignatius Ezeani, Rubungo Andre Niyongabo, Jonathan Mukiibi, Verrah Otiende, Iroro Orife, Davis David, Samba Ngom, Tosin Adewumi, Paul Rayson, Mofetoluwa Adeyemi, Gerald Muriuki, Emmanuel Anebi, Chiamaka Chukwuneke, Nkiruka Odu, Eric Peter Wairagala, Samuel Oyerinde, Clemencia Siro, Tobius Saul Bateesa, Temilola Oloyede, Yvonne Wambui, Victor Akinode, Deborah Nabagereka, Maurice Katusiime, Ayodele Awokoya, Mouhamadane MBOUP, Dibora Gebreyohannes, Henok Tilaye, Kelechi Nwaike, Degaga Wolde, Abdoulaye Faye, Blessing Sibanda, Orevaoghene Ahia, Bonaventure F. P. Dossou, Kelechi Ogueji, Thierno Ibrahima DIOP, Abdoulaye Diallo, Adewale Akinfaderin, Tendai Marengereke, Salomey Osei
We take a step towards addressing the under-representation of the African continent in NLP research by creating the first large publicly available high-quality dataset for named entity recognition (NER) in ten African languages, bringing together a variety of stakeholders.
no code implementations • 5 Feb 2021 • Henry B. Moss, David S. Leslie, Javier Gonzalez, Paul Rayson
This paper describes a general-purpose extension of max-value entropy search, a popular approach for Bayesian Optimisation (BO).
no code implementations • 12 Oct 2020 • Dawn Knight, Steve Morris, Tess Fitzpatrick, Paul Rayson, Irena Spasić, Enlli Môn Thomas
This report provides an overview of the CorCenCC project and the online corpus resource that was developed as a result of work on the project.
1 code implementation • NeurIPS 2020 • Henry B. Moss, Daniel Beck, Javier Gonzalez, David S. Leslie, Paul Rayson
This article develops a Bayesian optimization (BO) method which acts directly over raw strings, proposing the first uses of string kernels and genetic algorithms within BO loops.
no code implementations • 2 Jul 2020 • Henry B. Moss, David S. Leslie, Paul Rayson
Deployments of Bayesian Optimization (BO) for functions with stochastic evaluations, such as parameter tuning via cross validation and simulation optimization, typically optimize an average of a fixed set of noisy realizations of the objective function.
no code implementations • ACL 2020 • Lama Alsudias, Paul Rayson
We find that Machine Learning classifiers are able to correctly identify the rumour related tweets with 84{\%} accuracy.
no code implementations • 22 Jun 2020 • Henry B. Moss, David S. Leslie, Paul Rayson
MUMBO is scalable and efficient, allowing multi-task Bayesian optimization to be deployed in problems with rich parameter and fidelity spaces.
no code implementations • LREC 2020 • Matthew Coole, Paul Rayson, John Mariani
Creating, curating and maintaining modern political corpora is becoming an ever more involved task.
no code implementations • LREC 2020 • Matthew Coole, Paul Rayson, John Mariani
LexiDB is a tool for storing, managing and querying corpus data.
no code implementations • LREC 2020 • Mahmoud El-Haj, Nathan Rutherford, Matthew Coole, Ignatius Ezeani, Sheryl Prentice, Nancy Ide, Jo Knight, Scott Piao, John Mariani, Paul Rayson, Keith Suderman
The corpus database is distributed to permit fast indexing, and provides a simple web front-end with corpus linguistics methods for sub-corpus comparison and retrieval.
no code implementations • LREC 2020 • Lama Alsudias, Paul Rayson
Building ontologies is a crucial part of the semantic web endeavour.
no code implementations • 1 Apr 2020 • Ignatius Ezeani, Paul Rayson, Ikechukwu Onyenwe, Chinedu Uchechukwu, Mark Hepple
Although researchers and practitioners are pushing the boundaries and enhancing the capacities of NLP tools and methods, works on African languages are lagging.
no code implementations • WS 2019 • Ignatius Ezeani, Scott Piao, Steven Neale, Paul Rayson, Dawn Knight
While the application of word embedding models to downstream Natural Language Processing (NLP) tasks has been shown to be successful, the benefits for low-resource languages is somewhat limited due to lack of adequate data for training the models.
1 code implementation • ACL 2019 • Henry B. Moss, Andrew Moore, David S. Leslie, Paul Rayson
We present FIESTA, a model selection approach that significantly reduces the computational resources required to reliably identify state-of-the-art performance from large collections of candidate models.
no code implementations • 28 Mar 2019 • Mahmoud El-Haj, Paul Rayson, Martin Walker, Steven Young, Vasiliki Simaki
We critically assess mainstream accounting and finance research applying methods from computational linguistics (CL) to study financial discourse.
1 code implementation • COLING 2018 • Henry Moss, David Leslie, Paul Rayson
K-fold cross validation (CV) is a popular method for estimating the true performance of machine learning models, allowing model selection and parameter tuning.
1 code implementation • 19 Jun 2018 • Henry B. Moss, David S. Leslie, Paul Rayson
K-fold cross validation (CV) is a popular method for estimating the true performance of machine learning models, allowing model selection and parameter tuning.
1 code implementation • COLING 2018 • Andrew Moore, Paul Rayson
Lack of repeatability and generalisability are two significant threats to continuing scientific development in Natural Language Processing.
1 code implementation • SEMEVAL 2017 • Andrew Moore, Paul Rayson
This paper describes our participation in Task 5 track 2 of SemEval 2017 to predict the sentiment of financial news headlines for a specific company on a continuous scale between -1 and 1.
no code implementations • WS 2017 • Mahmoud El-Haj, Paul Rayson, Scott Piao, Stephen Wattam
Creating high-quality wide-coverage multilingual semantic lexicons to support knowledge-based approaches is a challenging time-consuming manual task.
no code implementations • LREC 2016 • Mahmoud El-Haj, Paul Rayson, Steve Young, Andrew Moore, Martin Walker, Thomas Schleicher, Vasiliki Athanasakou
Previous studies have only applied manual content analysis on a small scale to reveal such a bias in the narrative section of annual financial reports.
no code implementations • LREC 2016 • Muhammad Sharjeel, Paul Rayson, Rao Muhammad Adeel Nawab
Paraphrase plagiarism is a significant and widespread problem and research shows that it is hard to detect.
1 code implementation • LREC 2016 • Scott Piao, Paul Rayson, Dawn Archer, Francesca Bianchi, Carmen Dayrell, Mahmoud El-Haj, Ricardo-Mar{\'\i}a Jim{\'e}nez, Dawn Knight, Michal K{\v{r}}en, Laura L{\"o}fberg, Rao Muhammad Adeel Nawab, Jawad Shafi, Phoey Lee Teh, Olga Mudraya
Lexical coverage is an important factor concerning the quality of the lexicons and the performance of the corpus annotation tools, and in this experiment we focus on evaluating the lexical coverage achieved by the multilingual lexicons and semantic annotation tools based on them.
no code implementations • LREC 2016 • Mahmoud El-Haj, Paul Rayson
The Arabic sentences were written with the absence of diacritics and in order to count the number of syllables we added the diacritics in using an open source tool called Mishkal.
no code implementations • LREC 2014 • Stephen Wattam, Paul Rayson, Alex, Marc er, Jean Anderson
This is contrasted with a description of the cluster on which it was to run, and specific limitations are discussed such as the overhead of using SAN-based storage.
no code implementations • LREC 2014 • Mahmoud El-Haj, Paul Rayson, Steve Young, Martin Walker
In this paper we present the evaluation of our automatic methods for detecting and extracting document structure in annual financial reports.
no code implementations • LREC 2012 • Stephen Wattam, Paul Rayson, Damon Berridge
Increases in the use of web data for corpus-building, coupled with the use of specialist, single-use corpora, make for an increasing reliance on language that changes quickly, affecting the long-term validity of studies based on these methods.