Search Results for author: Paul Rayson

Found 40 papers, 11 papers with code

IgboBERT Models: Building and Training Transformer Models for the Igbo Language

1 code implementation LREC 2022 Chiamaka Chukwuneke, Ignatius Ezeani, Paul Rayson, Mahmoud El-Haj

Our results show that, although the IgboNER task benefited hugely from fine-tuning large transformer model, fine-tuning a transformer model built from scratch with comparatively little Igbo text data seems to yield quite decent results for the IgboNER task.

Language Modelling named-entity-recognition +2

CoFiF Plus: A French Financial Narrative Summarisation Corpus

no code implementations LREC 2022 Nadhem Zmandar, Tobias Daudert, Sina Ahmadi, Mahmoud El-Haj, Paul Rayson

Natural Language Processing is increasingly being applied in the finance and business industry to analyse the text of many different types of financial documents.

Understanding who uses Reddit: Profiling individuals with a self-reported bipolar disorder diagnosis

1 code implementation NAACL (CLPsych) 2021 Glorianna Jagfeld, Fiona Lobban, Paul Rayson, Steven H. Jones

Recently, research on mental health conditions using public online data, including Reddit, has surged in NLP and health research but has not reported user characteristics, which are important to judge generalisability of findings.

MasakhaNER: Named Entity Recognition for African Languages

2 code implementations22 Mar 2021 David Ifeoluwa Adelani, Jade Abbott, Graham Neubig, Daniel D'souza, Julia Kreutzer, Constantine Lignos, Chester Palen-Michel, Happy Buzaaba, Shruti Rijhwani, Sebastian Ruder, Stephen Mayhew, Israel Abebe Azime, Shamsuddeen Muhammad, Chris Chinenye Emezue, Joyce Nakatumba-Nabende, Perez Ogayo, Anuoluwapo Aremu, Catherine Gitau, Derguene Mbaye, Jesujoba Alabi, Seid Muhie Yimam, Tajuddeen Gwadabe, Ignatius Ezeani, Rubungo Andre Niyongabo, Jonathan Mukiibi, Verrah Otiende, Iroro Orife, Davis David, Samba Ngom, Tosin Adewumi, Paul Rayson, Mofetoluwa Adeyemi, Gerald Muriuki, Emmanuel Anebi, Chiamaka Chukwuneke, Nkiruka Odu, Eric Peter Wairagala, Samuel Oyerinde, Clemencia Siro, Tobius Saul Bateesa, Temilola Oloyede, Yvonne Wambui, Victor Akinode, Deborah Nabagereka, Maurice Katusiime, Ayodele Awokoya, Mouhamadane MBOUP, Dibora Gebreyohannes, Henok Tilaye, Kelechi Nwaike, Degaga Wolde, Abdoulaye Faye, Blessing Sibanda, Orevaoghene Ahia, Bonaventure F. P. Dossou, Kelechi Ogueji, Thierno Ibrahima DIOP, Abdoulaye Diallo, Adewale Akinfaderin, Tendai Marengereke, Salomey Osei

We take a step towards addressing the under-representation of the African continent in NLP research by creating the first large publicly available high-quality dataset for named entity recognition (NER) in ten African languages, bringing together a variety of stakeholders.

named-entity-recognition Named Entity Recognition +2

GIBBON: General-purpose Information-Based Bayesian OptimisatioN

no code implementations5 Feb 2021 Henry B. Moss, David S. Leslie, Javier Gonzalez, Paul Rayson

This paper describes a general-purpose extension of max-value entropy search, a popular approach for Bayesian Optimisation (BO).

Bayesian Optimisation Point Processes

The National Corpus of Contemporary Welsh: Project Report | Y Corpws Cenedlaethol Cymraeg Cyfoes: Adroddiad y Prosiect

no code implementations12 Oct 2020 Dawn Knight, Steve Morris, Tess Fitzpatrick, Paul Rayson, Irena Spasić, Enlli Môn Thomas

This report provides an overview of the CorCenCC project and the online corpus resource that was developed as a result of work on the project.

BOSS: Bayesian Optimization over String Spaces

1 code implementation NeurIPS 2020 Henry B. Moss, Daniel Beck, Javier Gonzalez, David S. Leslie, Paul Rayson

This article develops a Bayesian optimization (BO) method which acts directly over raw strings, proposing the first uses of string kernels and genetic algorithms within BO loops.

Bayesian Optimization

BOSH: Bayesian Optimization by Sampling Hierarchically

no code implementations2 Jul 2020 Henry B. Moss, David S. Leslie, Paul Rayson

Deployments of Bayesian Optimization (BO) for functions with stochastic evaluations, such as parameter tuning via cross validation and simulation optimization, typically optimize an average of a fixed set of noisy realizations of the objective function.

Bayesian Optimization reinforcement-learning +1

MUMBO: MUlti-task Max-value Bayesian Optimization

no code implementations22 Jun 2020 Henry B. Moss, David S. Leslie, Paul Rayson

MUMBO is scalable and efficient, allowing multi-task Bayesian optimization to be deployed in problems with rich parameter and fidelity spaces.

Bayesian Optimization

Infrastructure for Semantic Annotation in the Genomics Domain

no code implementations LREC 2020 Mahmoud El-Haj, Nathan Rutherford, Matthew Coole, Ignatius Ezeani, Sheryl Prentice, Nancy Ide, Jo Knight, Scott Piao, John Mariani, Paul Rayson, Keith Suderman

The corpus database is distributed to permit fast indexing, and provides a simple web front-end with corpus linguistics methods for sub-corpus comparison and retrieval.


Igbo-English Machine Translation: An Evaluation Benchmark

no code implementations1 Apr 2020 Ignatius Ezeani, Paul Rayson, Ikechukwu Onyenwe, Chinedu Uchechukwu, Mark Hepple

Although researchers and practitioners are pushing the boundaries and enhancing the capacities of NLP tools and methods, works on African languages are lagging.

Machine Translation Part-Of-Speech Tagging +1

Leveraging Pre-Trained Embeddings for Welsh Taggers

no code implementations WS 2019 Ignatius Ezeani, Scott Piao, Steven Neale, Paul Rayson, Dawn Knight

While the application of word embedding models to downstream Natural Language Processing (NLP) tasks has been shown to be successful, the benefits for low-resource languages is somewhat limited due to lack of adequate data for training the models.

FIESTA: Fast IdEntification of State-of-The-Art models using adaptive bandit algorithms

1 code implementation ACL 2019 Henry B. Moss, Andrew Moore, David S. Leslie, Paul Rayson

We present FIESTA, a model selection approach that significantly reduces the computational resources required to reliably identify state-of-the-art performance from large collections of candidate models.

Model Selection Sentiment Analysis

Using J-K-fold Cross Validation To Reduce Variance When Tuning NLP Models

1 code implementation COLING 2018 Henry Moss, David Leslie, Paul Rayson

K-fold cross validation (CV) is a popular method for estimating the true performance of machine learning models, allowing model selection and parameter tuning.

Document Classification General Classification +4

Using J-K fold Cross Validation to Reduce Variance When Tuning NLP Models

1 code implementation19 Jun 2018 Henry B. Moss, David S. Leslie, Paul Rayson

K-fold cross validation (CV) is a popular method for estimating the true performance of machine learning models, allowing model selection and parameter tuning.

Document Classification General Classification +4

Lancaster A at SemEval-2017 Task 5: Evaluation metrics matter: predicting sentiment from financial news headlines

1 code implementation SEMEVAL 2017 Andrew Moore, Paul Rayson

This paper describes our participation in Task 5 track 2 of SemEval 2017 to predict the sentiment of financial news headlines for a specific company on a continuous scale between -1 and 1.


OSMAN ― A Novel Arabic Readability Metric

no code implementations LREC 2016 Mahmoud El-Haj, Paul Rayson

The Arabic sentences were written with the absence of diacritics and in order to count the number of syllables we added the diacritics in using an open source tool called Mishkal.

UPPC - Urdu Paraphrase Plagiarism Corpus

no code implementations LREC 2016 Muhammad Sharjeel, Paul Rayson, Rao Muhammad Adeel Nawab

Paraphrase plagiarism is a significant and widespread problem and research shows that it is hard to detect.

Learning Tone and Attribution for Financial Text Mining

no code implementations LREC 2016 Mahmoud El-Haj, Paul Rayson, Steve Young, Andrew Moore, Martin Walker, Thomas Schleicher, Vasiliki Athanasakou

Previous studies have only applied manual content analysis on a small scale to reveal such a bias in the narrative section of annual financial reports.

BIG-bench Machine Learning

Lexical Coverage Evaluation of Large-scale Multilingual Semantic Lexicons for Twelve Languages

1 code implementation LREC 2016 Scott Piao, Paul Rayson, Dawn Archer, Francesca Bianchi, Carmen Dayrell, Mahmoud El-Haj, Ricardo-Mar{\'\i}a Jim{\'e}nez, Dawn Knight, Michal K{\v{r}}en, Laura L{\"o}fberg, Rao Muhammad Adeel Nawab, Jawad Shafi, Phoey Lee Teh, Olga Mudraya

Lexical coverage is an important factor concerning the quality of the lexicons and the performance of the corpus annotation tools, and in this experiment we focus on evaluating the lexical coverage achieved by the multilingual lexicons and semantic annotation tools based on them.

Experiences with Parallelisation of an Existing NLP Pipeline: Tagging Hansard

no code implementations LREC 2014 Stephen Wattam, Paul Rayson, Alex, Marc er, Jean Anderson

This is contrasted with a description of the cluster on which it was to run, and specific limitations are discussed such as the overhead of using SAN-based storage.

Detecting Document Structure in a Very Large Corpus of UK Financial Reports

no code implementations LREC 2014 Mahmoud El-Haj, Paul Rayson, Steve Young, Martin Walker

In this paper we present the evaluation of our automatic methods for detecting and extracting document structure in annual financial reports.

Text Generation

Document Attrition in Web Corpora: an Exploration

no code implementations LREC 2012 Stephen Wattam, Paul Rayson, Damon Berridge

Increases in the use of web data for corpus-building, coupled with the use of specialist, single-use corpora, make for an increasing reliance on language that changes quickly, affecting the long-term validity of studies based on these methods.

Cannot find the paper you are looking for? You can Submit a new open access paper.