Search Results for author: Andrew Caines

Found 28 papers, 3 papers with code

Detecting Trending Terms in Cybersecurity Forum Discussions

no code implementations • EMNLP (WNUT) 2020 • Jack Hughes, Seth Aycock, Andrew Caines, Paula Buttery, Alice Hutchings

We present a lightweight method for identifying currently trending terms in relation to a known prior of terms, using a weighted log-odds ratio with an informative prior.

Information Retrieval Retrieval

Paper
Add Code

Towards an open-domain chatbot for language practice

1 code implementation • NAACL (BEA) 2022 • Gladys Tyen, Mark Brenchley, Andrew Caines, Paula Buttery

State-of-the-art chatbots for English are now able to hold conversations on virtually any topic (e. g. Adiwardana et al., 2020; Roller et al., 2021).

Dialogue Generation Re-Ranking

Paper
Code

ALEN App: Argumentative Writing Support To Foster English Language Learning

no code implementations • NAACL (BEA) 2022 • Thiemo Wambsganss, Andrew Caines, Paula Buttery

We present an approach which automatically detects claim-premise structures and provides visual feedback to the learner to prompt them to repair any broken argumentation structures. To investigate, if our persuasive feedback on language learners’ essay writing tasks engages and supports them in learning better English language, we designed the ALEN app (Argumentation for Learning English).

Paper
Add Code

The Specificity and Helpfulness of Peer-to-Peer Feedback in Higher Education

no code implementations • NAACL (BEA) 2022 • Roman Rietsche, Andrew Caines, Cornelius Schramm, Dominik Pfütze, Paula Buttery

This peer-to-peer feedback has become increasingly important whether in MOOCs to provide feedback to thousands of students or in large-scale classes at universities.

Sentence Specificity +1

Paper
Add Code

Efficient Unsupervised NMT for Related Languages with Cross-Lingual Language Models and Fidelity Objectives

no code implementations • EACL (VarDial) 2021 • Rami Aly, Andrew Caines, Paula Buttery

The most successful approach to Neural Machine Translation (NMT) when only monolingual training data is available, called unsupervised machine translation, is based on back-translation where noisy translations are generated to turn the task into a supervised one.

Denoising Language Modelling +3

Paper
Add Code

Prompting open-source and commercial language models for grammatical error correction of English learner text

no code implementations • 15 Jan 2024 • Christopher Davis, Andrew Caines, Øistein Andersen, Shiva Taslimipoor, Helen Yannakoudakis, Zheng Yuan, Christopher Bryant, Marek Rei, Paula Buttery

Thanks to recent advances in generative AI, we are able to prompt large language models (LLMs) to produce texts which are fluent and grammatical.

Grammatical Error Correction

Paper
Add Code

CLIMB: Curriculum Learning for Infant-inspired Model Building

no code implementations • 15 Nov 2023 • Richard Diehl Martinez, Zebulon Goriely, Hope McGovern, Christopher Davis, Andrew Caines, Paula Buttery, Lisa Beinborn

We describe our team's contribution to the STRICT-SMALL track of the BabyLM Challenge.

Language Modelling Masked Language Modeling

Paper
Add Code

On the application of Large Language Models for language teaching and assessment technology

no code implementations • 17 Jul 2023 • Andrew Caines, Luca Benedetto, Shiva Taslimipoor, Christopher Davis, Yuan Gao, Oeistein Andersen, Zheng Yuan, Mark Elliott, Russell Moore, Christopher Bryant, Marek Rei, Helen Yannakoudakis, Andrew Mullooly, Diane Nicholls, Paula Buttery

The recent release of very large language models such as PaLM and GPT-4 has made an unprecedented impact in the popular media and public consciousness, giving rise to a mixture of excitement and fear as to their capabilities and potential uses, and shining a light on natural language processing research which had not previously received so much attention.

Grammatical Error Correction Misinformation +1

Paper
Add Code

Finding the Needle in a Haystack: Unsupervised Rationale Extraction from Long Text Classifiers

no code implementations • 14 Mar 2023 • Kamil Bujel, Andrew Caines, Helen Yannakoudakis, Marek Rei

Long-sequence transformers are designed to improve the representation of longer texts by language models and their performance on downstream document-level tasks.

Document Classification Language Modelling +3

Paper
Add Code

Probing for targeted syntactic knowledge through grammatical error detection

1 code implementation • 28 Oct 2022 • Christopher Davis, Christopher Bryant, Andrew Caines, Marek Rei, Paula Buttery

Targeted studies testing knowledge of subject-verb agreement (SVA) indicate that pre-trained language models encode syntactic information.

Grammatical Error Detection

Paper
Code

Towards a parallel corpus of Portuguese and the Bantu language Emakhuwa of Mozambique

no code implementations • 12 Apr 2021 • Felermino D. M. A. Ali, Andrew Caines, Jaimito L. A. Malavi

Major advancement in the performance of machine translation models has been made possible in part thanks to the availability of large-scale parallel corpora.

Machine Translation Sentence +1

Paper
Add Code

Grammatical error detection in transcriptions of spoken English

no code implementations • COLING 2020 • Andrew Caines, Christian Bentz, Kate Knill, Marek Rei, Paula Buttery

We describe the collection of transcription corrections and grammatical error annotations for the CrowdED Corpus of spoken English monologues on business topics.

Grammatical Error Detection

Paper
Add Code

The Teacher-Student Chatroom Corpus

no code implementations • NLP4CALL (COLING) 2020 • Andrew Caines, Helen Yannakoudakis, Helena Edmondson, Helen Allen, Pascual Pérez-Paredes, Bill Byrne, Paula Buttery

The Teacher-Student Chatroom Corpus (TSCC) is a collection of written conversations captured during one-to-one lessons between teachers and learners of English.

Descriptive

Paper
Add Code

An Expectation Maximisation Algorithm for Automated Cognate Detection

no code implementations • CONLL 2020 • Roddy MacSween, Andrew Caines

In historical linguistics, cognate detection is the task of determining whether sets of words have common etymological roots.

Paper
Add Code

Investigating the effect of auxiliary objectives for the automated grading of learner English speech transcriptions

no code implementations • ACL 2020 • Hannah Craighead, Andrew Caines, Paula Buttery, Helen Yannakoudakis

We address the task of automatically grading the language proficiency of spontaneous speech based on textual features from automatic speech recognition transcripts.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

REPROLANG 2020: Automatic Proficiency Scoring of Czech, English, German, Italian, and Spanish Learner Essays

no code implementations • LREC 2020 • Andrew Caines, Paula Buttery

We report on our attempts to reproduce the work described in Vajjala {\&} Rama 2018, {`}Experiments with universal CEFR classification{'}, as part of REPROLANG 2020: this involves featured-based and neural approaches to essay scoring in Czech, German and Italian.

Paper
Add Code

Adaptive Forgetting Curves for Spaced Repetition Language Learning

no code implementations • 23 Apr 2020 • Ahmed Zaidi, Andrew Caines, Russell Moore, Paula Buttery, Andrew Rice

The forgetting curve has been extensively explored by psychologists, educationalists and cognitive scientists alike.

Paper
Add Code

CAMsterdam at SemEval-2019 Task 6: Neural and graph-based feature extraction for the identification of offensive tweets

no code implementations • SEMEVAL 2019 • Guy Aglionby, Chris Davis, Pushkar Mishra, Andrew Caines, Helen Yannakoudakis, Marek Rei, Ekaterina Shutova, Paula Buttery

We describe the CAMsterdam team entry to the SemEval-2019 Shared Task 6 on offensive language identification in Twitter data.

General Classification Language Identification +2

Paper
Add Code

Aggressive language in an online hacking forum

no code implementations • WS 2018 • Andrew Caines, Sergio Pastrana, Alice Hutchings, Paula Buttery

We probe the heterogeneity in levels of abusive language in different sections of the Internet, using an annotated corpus of Wikipedia page edit comments to train a binary classifier for abuse detection.

Abuse Detection Abusive Language +1

Paper
Add Code

A Text Normalisation System for Non-Standard English Words

no code implementations • WS 2017 • Emma Flint, Elliot Ford, Olivia Thomas, Andrew Caines, Paula Buttery

This paper investigates the problem of text normalisation; specifically, the normalisation of non-standard words (NSWs) in English.

Automatic Speech Recognition (ASR)

Paper
Add Code

Parsing transcripts of speech

no code implementations • WS 2017 • Andrew Caines, Michael McCarthy, Paula Buttery

We present an analysis of parser performance on speech data, comparing word type and token frequency distributions with written data, and evaluating parse accuracy by length of input string.

Paper
Add Code

Collecting fluency corrections for spoken learner English

no code implementations • WS 2017 • Andrew Caines, Emma Flint, Paula Buttery

We present crowdsourced collection of error annotations for transcriptions of spoken learner English.

Grammatical Error Detection Machine Translation

Paper
Add Code

Automated speech-unit delimitation in spoken learner English

no code implementations • COLING 2016 • Russell Moore, Andrew Caines, Calbert Graham, Paula Buttery

In order to apply computational linguistic analyses and pass information to downstream applications, transcriptions of speech obtained via automatic speech recognition (ASR) need to be divided into smaller meaningful units, in a task we refer to as {`}speech-unit (SU) delimitation{'}.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Crowdsourcing a Multi-lingual Speech Corpus: Recording, Transcription and Annotation of the CrowdIS Corpora

no code implementations • LREC 2016 • Andrew Caines, Christian Bentz, Calbert Graham, Tim Polzehl, Paula Buttery

We announce the release of the CROWDED CORPUS: a pair of speech corpora collected via crowdsourcing, containing a native speaker corpus of English (CROWDED{\_}ENGLISH), and a corpus of German/English bilinguals (CROWDED{\_}BILINGUAL).

Sentence valid

Paper
Add Code

Predicting Author Age from Weibo Microblog Posts

1 code implementation • LREC 2016 • Wanru Zhang, Andrew Caines, Dimitrios Alikaniotis, Paula Buttery

Binary file summaries/958. html matches

Paper
Code

The effect of disfluencies and learner errors on the parsing of spoken learner language

no code implementations • WS 2014 • Andrew Caines, Paula Buttery

Language Acquisition

Paper
Add Code

Reclassifying subcategorization frames for experimental analysis and stimulus generation

no code implementations • LREC 2012 • Paula Buttery, Andrew Caines

The premise was not only to compare the results of two quite different methods for our own interest, but also to enable other researchers to choose whichever reclassification better suited their purpose (one being grounded purely in theoretical linguistics and the other in practical language engineering).

Language Acquisition

Paper
Add Code

Annotating progressive aspect constructions in the spoken section of the British National Corpus

no code implementations • LREC 2012 • Andrew Caines, Paula Buttery

We present a set of stand-off annotations for the ninety thousand sentences in the spoken section of the British National Corpus (BNC) which feature a progressive aspect verb group.

Sentence

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.