Search Results for author: Andrew Caines

Found 28 papers, 3 papers with code

Detecting Trending Terms in Cybersecurity Forum Discussions

no code implementations EMNLP (WNUT) 2020 Jack Hughes, Seth Aycock, Andrew Caines, Paula Buttery, Alice Hutchings

We present a lightweight method for identifying currently trending terms in relation to a known prior of terms, using a weighted log-odds ratio with an informative prior.

Information Retrieval Retrieval

Towards an open-domain chatbot for language practice

1 code implementation NAACL (BEA) 2022 Gladys Tyen, Mark Brenchley, Andrew Caines, Paula Buttery

State-of-the-art chatbots for English are now able to hold conversations on virtually any topic (e. g. Adiwardana et al., 2020; Roller et al., 2021).

Dialogue Generation Re-Ranking

ALEN App: Argumentative Writing Support To Foster English Language Learning

no code implementations NAACL (BEA) 2022 Thiemo Wambsganss, Andrew Caines, Paula Buttery

We present an approach which automatically detects claim-premise structures and provides visual feedback to the learner to prompt them to repair any broken argumentation structures. To investigate, if our persuasive feedback on language learners’ essay writing tasks engages and supports them in learning better English language, we designed the ALEN app (Argumentation for Learning English).

The Specificity and Helpfulness of Peer-to-Peer Feedback in Higher Education

no code implementations NAACL (BEA) 2022 Roman Rietsche, Andrew Caines, Cornelius Schramm, Dominik Pfütze, Paula Buttery

This peer-to-peer feedback has become increasingly important whether in MOOCs to provide feedback to thousands of students or in large-scale classes at universities.

Sentence Specificity +1

Efficient Unsupervised NMT for Related Languages with Cross-Lingual Language Models and Fidelity Objectives

no code implementations EACL (VarDial) 2021 Rami Aly, Andrew Caines, Paula Buttery

The most successful approach to Neural Machine Translation (NMT) when only monolingual training data is available, called unsupervised machine translation, is based on back-translation where noisy translations are generated to turn the task into a supervised one.

Denoising Language Modelling +3

On the application of Large Language Models for language teaching and assessment technology

no code implementations17 Jul 2023 Andrew Caines, Luca Benedetto, Shiva Taslimipoor, Christopher Davis, Yuan Gao, Oeistein Andersen, Zheng Yuan, Mark Elliott, Russell Moore, Christopher Bryant, Marek Rei, Helen Yannakoudakis, Andrew Mullooly, Diane Nicholls, Paula Buttery

The recent release of very large language models such as PaLM and GPT-4 has made an unprecedented impact in the popular media and public consciousness, giving rise to a mixture of excitement and fear as to their capabilities and potential uses, and shining a light on natural language processing research which had not previously received so much attention.

Grammatical Error Correction Misinformation +1

Finding the Needle in a Haystack: Unsupervised Rationale Extraction from Long Text Classifiers

no code implementations14 Mar 2023 Kamil Bujel, Andrew Caines, Helen Yannakoudakis, Marek Rei

Long-sequence transformers are designed to improve the representation of longer texts by language models and their performance on downstream document-level tasks.

Document Classification Language Modelling +3

Probing for targeted syntactic knowledge through grammatical error detection

1 code implementation28 Oct 2022 Christopher Davis, Christopher Bryant, Andrew Caines, Marek Rei, Paula Buttery

Targeted studies testing knowledge of subject-verb agreement (SVA) indicate that pre-trained language models encode syntactic information.

Grammatical Error Detection

Towards a parallel corpus of Portuguese and the Bantu language Emakhuwa of Mozambique

no code implementations12 Apr 2021 Felermino D. M. A. Ali, Andrew Caines, Jaimito L. A. Malavi

Major advancement in the performance of machine translation models has been made possible in part thanks to the availability of large-scale parallel corpora.

Machine Translation Sentence +1

Grammatical error detection in transcriptions of spoken English

no code implementations COLING 2020 Andrew Caines, Christian Bentz, Kate Knill, Marek Rei, Paula Buttery

We describe the collection of transcription corrections and grammatical error annotations for the CrowdED Corpus of spoken English monologues on business topics.

Grammatical Error Detection

The Teacher-Student Chatroom Corpus

no code implementations NLP4CALL (COLING) 2020 Andrew Caines, Helen Yannakoudakis, Helena Edmondson, Helen Allen, Pascual Pérez-Paredes, Bill Byrne, Paula Buttery

The Teacher-Student Chatroom Corpus (TSCC) is a collection of written conversations captured during one-to-one lessons between teachers and learners of English.

Descriptive

An Expectation Maximisation Algorithm for Automated Cognate Detection

no code implementations CONLL 2020 Roddy MacSween, Andrew Caines

In historical linguistics, cognate detection is the task of determining whether sets of words have common etymological roots.

REPROLANG 2020: Automatic Proficiency Scoring of Czech, English, German, Italian, and Spanish Learner Essays

no code implementations LREC 2020 Andrew Caines, Paula Buttery

We report on our attempts to reproduce the work described in Vajjala {\&} Rama 2018, {`}Experiments with universal CEFR classification{'}, as part of REPROLANG 2020: this involves featured-based and neural approaches to essay scoring in Czech, German and Italian.

Adaptive Forgetting Curves for Spaced Repetition Language Learning

no code implementations23 Apr 2020 Ahmed Zaidi, Andrew Caines, Russell Moore, Paula Buttery, Andrew Rice

The forgetting curve has been extensively explored by psychologists, educationalists and cognitive scientists alike.

Aggressive language in an online hacking forum

no code implementations WS 2018 Andrew Caines, Sergio Pastrana, Alice Hutchings, Paula Buttery

We probe the heterogeneity in levels of abusive language in different sections of the Internet, using an annotated corpus of Wikipedia page edit comments to train a binary classifier for abuse detection.

Abuse Detection Abusive Language +1

A Text Normalisation System for Non-Standard English Words

no code implementations WS 2017 Emma Flint, Elliot Ford, Olivia Thomas, Andrew Caines, Paula Buttery

This paper investigates the problem of text normalisation; specifically, the normalisation of non-standard words (NSWs) in English.

Automatic Speech Recognition (ASR)

Parsing transcripts of speech

no code implementations WS 2017 Andrew Caines, Michael McCarthy, Paula Buttery

We present an analysis of parser performance on speech data, comparing word type and token frequency distributions with written data, and evaluating parse accuracy by length of input string.

Automated speech-unit delimitation in spoken learner English

no code implementations COLING 2016 Russell Moore, Andrew Caines, Calbert Graham, Paula Buttery

In order to apply computational linguistic analyses and pass information to downstream applications, transcriptions of speech obtained via automatic speech recognition (ASR) need to be divided into smaller meaningful units, in a task we refer to as {`}speech-unit (SU) delimitation{'}.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Crowdsourcing a Multi-lingual Speech Corpus: Recording, Transcription and Annotation of the CrowdIS Corpora

no code implementations LREC 2016 Andrew Caines, Christian Bentz, Calbert Graham, Tim Polzehl, Paula Buttery

We announce the release of the CROWDED CORPUS: a pair of speech corpora collected via crowdsourcing, containing a native speaker corpus of English (CROWDED{\_}ENGLISH), and a corpus of German/English bilinguals (CROWDED{\_}BILINGUAL).

Sentence valid

Reclassifying subcategorization frames for experimental analysis and stimulus generation

no code implementations LREC 2012 Paula Buttery, Andrew Caines

The premise was not only to compare the results of two quite different methods for our own interest, but also to enable other researchers to choose whichever reclassification better suited their purpose (one being grounded purely in theoretical linguistics and the other in practical language engineering).

Language Acquisition

Annotating progressive aspect constructions in the spoken section of the British National Corpus

no code implementations LREC 2012 Andrew Caines, Paula Buttery

We present a set of stand-off annotations for the ninety thousand sentences in the spoken section of the British National Corpus (BNC) which feature a progressive aspect verb group.

Sentence

Cannot find the paper you are looking for? You can Submit a new open access paper.