Search Results for author: Paul Rayson

Found 36 papers, 11 papers with code

MasakhaNER: Named Entity Recognition for African Languages

2 code implementations • 22 Mar 2021 • David Ifeoluwa Adelani, Jade Abbott, Graham Neubig, Daniel D'souza, Julia Kreutzer, Constantine Lignos, Chester Palen-Michel, Happy Buzaaba, Shruti Rijhwani, Sebastian Ruder, Stephen Mayhew, Israel Abebe Azime, Shamsuddeen Muhammad, Chris Chinenye Emezue, Joyce Nakatumba-Nabende, Perez Ogayo, Anuoluwapo Aremu, Catherine Gitau, Derguene Mbaye, Jesujoba Alabi, Seid Muhie Yimam, Tajuddeen Gwadabe, Ignatius Ezeani, Rubungo Andre Niyongabo, Jonathan Mukiibi, Verrah Otiende, Iroro Orife, Davis David, Samba Ngom, Tosin Adewumi, Paul Rayson, Mofetoluwa Adeyemi, Gerald Muriuki, Emmanuel Anebi, Chiamaka Chukwuneke, Nkiruka Odu, Eric Peter Wairagala, Samuel Oyerinde, Clemencia Siro, Tobius Saul Bateesa, Temilola Oloyede, Yvonne Wambui, Victor Akinode, Deborah Nabagereka, Maurice Katusiime, Ayodele Awokoya, Mouhamadane MBOUP, Dibora Gebreyohannes, Henok Tilaye, Kelechi Nwaike, Degaga Wolde, Abdoulaye Faye, Blessing Sibanda, Orevaoghene Ahia, Bonaventure F. P. Dossou, Kelechi Ogueji, Thierno Ibrahima DIOP, Abdoulaye Diallo, Adewale Akinfaderin, Tendai Marengereke, Salomey Osei

We take a step towards addressing the under-representation of the African continent in NLP research by creating the first large publicly available high-quality dataset for named entity recognition (NER) in ten African languages, bringing together a variety of stakeholders.

named-entity-recognition Named Entity Recognition +2

Paper
Code

Lexical Coverage Evaluation of Large-scale Multilingual Semantic Lexicons for Twelve Languages

1 code implementation • LREC 2016 • Scott Piao, Paul Rayson, Dawn Archer, Francesca Bianchi, Carmen Dayrell, Mahmoud El-Haj, Ricardo-Mar{\'\i}a Jim{\'e}nez, Dawn Knight, Michal K{\v{r}}en, Laura L{\"o}fberg, Rao Muhammad Adeel Nawab, Jawad Shafi, Phoey Lee Teh, Olga Mudraya

Lexical coverage is an important factor concerning the quality of the lexicons and the performance of the corpus annotation tools, and in this experiment we focus on evaluating the lexical coverage achieved by the multilingual lexicons and semantic annotation tools based on them.

Paper
Code

BOSS: Bayesian Optimization over String Spaces

1 code implementation • NeurIPS 2020 • Henry B. Moss, Daniel Beck, Javier Gonzalez, David S. Leslie, Paul Rayson

This article develops a Bayesian optimization (BO) method which acts directly over raw strings, proposing the first uses of string kernels and genetic algorithms within BO loops.

Bayesian Optimization

Paper
Code

Bringing replication and reproduction together with generalisability in NLP: Three reproduction studies for Target Dependent Sentiment Analysis

1 code implementation • COLING 2018 • Andrew Moore, Paul Rayson

Lack of repeatability and generalisability are two significant threats to continuing scientific development in Natural Language Processing.

Sentiment Analysis

Paper
Code

Lancaster A at SemEval-2017 Task 5: Evaluation metrics matter: predicting sentiment from financial news headlines

1 code implementation • SEMEVAL 2017 • Andrew Moore, Paul Rayson

This paper describes our participation in Task 5 track 2 of SemEval 2017 to predict the sentiment of financial news headlines for a specific company on a continuous scale between -1 and 1.

regression

Paper
Code

FIESTA: Fast IdEntification of State-of-The-Art models using adaptive bandit algorithms

1 code implementation • ACL 2019 • Henry B. Moss, Andrew Moore, David S. Leslie, Paul Rayson

We present FIESTA, a model selection approach that significantly reduces the computational resources required to reliably identify state-of-the-art performance from large collections of candidate models.

Model Selection Sentiment Analysis

Paper
Code

Using J-K fold Cross Validation to Reduce Variance When Tuning NLP Models

1 code implementation • 19 Jun 2018 • Henry B. Moss, David S. Leslie, Paul Rayson

K-fold cross validation (CV) is a popular method for estimating the true performance of machine learning models, allowing model selection and parameter tuning.

Document Classification General Classification +4

Paper
Code

Using J-K-fold Cross Validation To Reduce Variance When Tuning NLP Models

1 code implementation • COLING 2018 • Henry Moss, David Leslie, Paul Rayson

K-fold cross validation (CV) is a popular method for estimating the true performance of machine learning models, allowing model selection and parameter tuning.

Document Classification General Classification +4

Paper
Code

Understanding who uses Reddit: Profiling individuals with a self-reported bipolar disorder diagnosis

1 code implementation • NAACL (CLPsych) 2021 • Glorianna Jagfeld, Fiona Lobban, Paul Rayson, Steven H. Jones

Recently, research on mental health conditions using public online data, including Reddit, has surged in NLP and health research but has not reported user characteristics, which are important to judge generalisability of findings.

Paper
Code

Creating and Validating Multilingual Semantic Representations for Six Languages: Expert versus Non-Expert Crowds

no code implementations • WS 2017 • Mahmoud El-Haj, Paul Rayson, Scott Piao, Stephen Wattam

Creating high-quality wide-coverage multilingual semantic lexicons to support knowledge-based approaches is a challenging time-consuming manual task.

Named Entity Recognition (NER) Sentiment Analysis +1

Paper
Add Code

Towards a Welsh Semantic Annotation System

no code implementations • LREC 2018 • Scott Piao, Paul Rayson, Dawn Knight, Gareth Watkins

Paper
Add Code

Arabic Dialect Identification in the Context of Bivalency and Code-Switching

no code implementations • LREC 2018 • Mahmoud El-Haj, Paul Rayson, Mariam Aboelezz

Dialect Identification Machine Translation

Paper
Add Code

Profiling Medical Journal Articles Using a Gene Ontology Semantic Tagger

1 code implementation • LREC 2018 • Mahmoud El-Haj, Paul Rayson, Scott Piao, Jo Knight

Information Retrieval

Paper
Code

Development of the Multilingual Semantic Annotation System

no code implementations • HLT 2015 • Paul Rayson, Scott Piao, Carmen Dayrell, Francesca Bianchi, Angela D'Egidio

Multilingual NLP

Paper
Add Code

Using a Keyness Metric for Single and Multi Document Summarisation

no code implementations • WS 2013 • Mahmoud El-Haj, Paul Rayson

Document Summarization Multi-Document Summarization

Paper
Add Code

Detecting Document Structure in a Very Large Corpus of UK Financial Reports

no code implementations • LREC 2014 • Mahmoud El-Haj, Paul Rayson, Steve Young, Martin Walker

In this paper we present the evaluation of our automatic methods for detecting and extracting document structure in annual financial reports.

Text Generation

Paper
Add Code

Experiences with Parallelisation of an Existing NLP Pipeline: Tagging Hansard

no code implementations • LREC 2014 • Stephen Wattam, Paul Rayson, Alex, Marc er, Jean Anderson

This is contrasted with a description of the cluster on which it was to run, and specific limitations are discussed such as the overhead of using SAN-based storage.

Paper
Add Code

Document Attrition in Web Corpora: an Exploration

no code implementations • LREC 2012 • Stephen Wattam, Paul Rayson, Damon Berridge

Increases in the use of web data for corpus-building, coupled with the use of specialist, single-use corpora, make for an increasing reliance on language that changes quickly, affecting the long-term validity of studies based on these methods.

Paper
Add Code

In Search of Meaning: Lessons, Resources and Next Steps for Computational Analysis of Financial Discourse

no code implementations • 28 Mar 2019 • Mahmoud El-Haj, Paul Rayson, Martin Walker, Steven Young, Vasiliki Simaki

We critically assess mainstream accounting and finance research applying methods from computational linguistics (CL) to study financial discourse.

named-entity-recognition Named Entity Recognition +2

Paper
Add Code

OSMAN â€• A Novel Arabic Readability Metric

no code implementations • LREC 2016 • Mahmoud El-Haj, Paul Rayson

The Arabic sentences were written with the absence of diacritics and in order to count the number of syllables we added the diacritics in using an open source tool called Mishkal.

Paper
Add Code

Learning Tone and Attribution for Financial Text Mining

no code implementations • LREC 2016 • Mahmoud El-Haj, Paul Rayson, Steve Young, Andrew Moore, Martin Walker, Thomas Schleicher, Vasiliki Athanasakou

Previous studies have only applied manual content analysis on a small scale to reveal such a bias in the narrative section of annual financial reports.

Attribute BIG-bench Machine Learning

Paper
Add Code

UPPC - Urdu Paraphrase Plagiarism Corpus

no code implementations • LREC 2016 • Muhammad Sharjeel, Paul Rayson, Rao Muhammad Adeel Nawab

Paraphrase plagiarism is a significant and widespread problem and research shows that it is hard to detect.

Paper
Add Code

Leveraging Pre-Trained Embeddings for Welsh Taggers

no code implementations • WS 2019 • Ignatius Ezeani, Scott Piao, Steven Neale, Paul Rayson, Dawn Knight

While the application of word embedding models to downstream Natural Language Processing (NLP) tasks has been shown to be successful, the benefits for low-resource languages is somewhat limited due to lack of adequate data for training the models.

Paper
Add Code

Classifying Information Sources in Arabic Twitter to Support Online Monitoring of Infectious Diseases

no code implementations • WS 2019 • Lama Alsudias, Paul Rayson

Paper
Add Code

Igbo-English Machine Translation: An Evaluation Benchmark

no code implementations • 1 Apr 2020 • Ignatius Ezeani, Paul Rayson, Ikechukwu Onyenwe, Chinedu Uchechukwu, Mark Hepple

Although researchers and practitioners are pushing the boundaries and enhancing the capacities of NLP tools and methods, works on African languages are lagging.

Machine Translation Part-Of-Speech Tagging +1

Paper
Add Code

Unfinished Business: Construction and Maintenance of a Semantically Tagged Historical Parliamentary Corpus, UK Hansard from 1803 to the present day

no code implementations • LREC 2020 • Matthew Coole, Paul Rayson, John Mariani

Creating, curating and maintaining modern political corpora is becoming an ever more involved task.

Part-Of-Speech Tagging

Paper
Add Code

LexiDB: Patterns \& Methods for Corpus Linguistic Database Management

no code implementations • LREC 2020 • Matthew Coole, Paul Rayson, John Mariani

LexiDB is a tool for storing, managing and querying corpus data.

Management Retrieval

Paper
Add Code

Developing an Arabic Infectious Disease Ontology to Include Non-Standard Terminology

no code implementations • LREC 2020 • Lama Alsudias, Paul Rayson

Building ontologies is a crucial part of the semantic web endeavour.

Term Extraction

Paper
Add Code

Infrastructure for Semantic Annotation in the Genomics Domain

no code implementations • LREC 2020 • Mahmoud El-Haj, Nathan Rutherford, Matthew Coole, Ignatius Ezeani, Sheryl Prentice, Nancy Ide, Jo Knight, Scott Piao, John Mariani, Paul Rayson, Keith Suderman

The corpus database is distributed to permit fast indexing, and provides a simple web front-end with corpus linguistics methods for sub-corpus comparison and retrieval.

Retrieval

Paper
Add Code

MUMBO: MUlti-task Max-value Bayesian Optimization

no code implementations • 22 Jun 2020 • Henry B. Moss, David S. Leslie, Paul Rayson

MUMBO is scalable and efficient, allowing multi-task Bayesian optimization to be deployed in problems with rich parameter and fidelity spaces.

Bayesian Optimization

Paper
Add Code

BOSH: Bayesian Optimization by Sampling Hierarchically

no code implementations • 2 Jul 2020 • Henry B. Moss, David S. Leslie, Paul Rayson

Deployments of Bayesian Optimization (BO) for functions with stochastic evaluations, such as parameter tuning via cross validation and simulation optimization, typically optimize an average of a fixed set of noisy realizations of the objective function.

Bayesian Optimization reinforcement-learning +1

Paper
Add Code

The National Corpus of Contemporary Welsh: Project Report | Y Corpws Cenedlaethol Cymraeg Cyfoes: Adroddiad y Prosiect

no code implementations • 12 Oct 2020 • Dawn Knight, Steve Morris, Tess Fitzpatrick, Paul Rayson, Irena Spasić, Enlli Môn Thomas

This report provides an overview of the CorCenCC project and the online corpus resource that was developed as a result of work on the project.

Paper
Add Code

COVID-19 and Arabic Twitter: How can Arab World Governments and Public Health Organizations Learn from Social Media?

no code implementations • ACL 2020 • Lama Alsudias, Paul Rayson

We find that Machine Learning classifiers are able to correctly identify the rumour related tweets with 84{\%} accuracy.

BIG-bench Machine Learning Rumour Detection +1

Paper
Add Code

GIBBON: General-purpose Information-Based Bayesian OptimisatioN

no code implementations • 5 Feb 2021 • Henry B. Moss, David S. Leslie, Javier Gonzalez, Paul Rayson

This paper describes a general-purpose extension of max-value entropy search, a popular approach for Bayesian Optimisation (BO).

Bayesian Optimisation Point Processes

Paper
Add Code

IgboBERT Models: Building and Training Transformer Models for the Igbo Language

1 code implementation • LREC 2022 • Chiamaka Chukwuneke, Ignatius Ezeani, Paul Rayson, Mahmoud El-Haj

Our results show that, although the IgboNER task benefited hugely from fine-tuning large transformer model, fine-tuning a transformer model built from scratch with comparatively little Igbo text data seems to yield quite decent results for the IgboNER task.

Language Modelling named-entity-recognition +2

Paper
Code

AraSAS: The Open Source Arabic Semantic Tagger

no code implementations • OSACT (LREC) 2022 • Mahmoud El-Haj, Elvis de Souza, Nouran Khallaf, Paul Rayson, Nizar Habash

This paper presents (AraSAS) the first open-source Arabic semantic analysis tagging system.

TAG

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.