Search Results for author: Edward Gow-Smith

Found 8 papers, 4 papers with code

Word Boundary Information Isn't Useful for Encoder Language Models

no code implementations15 Jan 2024 Edward Gow-Smith, Dylan Phelps, Harish Tayyar Madabushi, Carolina Scarton, Aline Villavicencio

As such, removing these symbols has been shown to have a beneficial effect on the processing of morphologically complex words for transformer encoders in the pretrain-finetune paradigm.

NER Sentence

Sheffield's Submission to the AmericasNLP Shared Task on Machine Translation into Indigenous Languages

1 code implementation16 Jun 2023 Edward Gow-Smith, Danae Sánchez Villegas

In this paper we describe the University of Sheffield's submission to the AmericasNLP 2023 Shared Task on Machine Translation into Indigenous Languages which comprises the translation from Spanish to eleven indigenous languages.

Machine Translation Translation

NAVER LABS Europe's Multilingual Speech Translation Systems for the IWSLT 2023 Low-Resource Track

no code implementations13 Jun 2023 Edward Gow-Smith, Alexandre Berard, Marcely Zanon Boito, Ioan Calapodescu

This paper presents NAVER LABS Europe's systems for Tamasheq-French and Quechua-Spanish speech translation in the IWSLT 2023 Low-Resource track.

Translation

Sample Efficient Approaches for Idiomaticity Detection

no code implementations LREC (MWE) 2022 Dylan Phelps, Xuan-Rui Fan, Edward Gow-Smith, Harish Tayyar Madabushi, Carolina Scarton, Aline Villavicencio

In particular we study the impact of Pattern Exploit Training (PET), a few-shot method of classification, and BERTRAM, an efficient method of creating contextual embeddings, on the task of idiomaticity detection.

SemEval-2022 Task 2: Multilingual Idiomaticity Detection and Sentence Embedding

1 code implementation SemEval (NAACL) 2022 Harish Tayyar Madabushi, Edward Gow-Smith, Marcos Garcia, Carolina Scarton, Marco Idiart, Aline Villavicencio

This paper presents the shared task on Multilingual Idiomaticity Detection and Sentence Embedding, which consists of two subtasks: (a) a binary classification task aimed at identifying whether a sentence contains an idiomatic expression, and (b) a task based on semantic text similarity which requires the model to adequately represent potentially idiomatic expressions in context.

Binary Classification Sentence +4

Improving Tokenisation by Alternative Treatment of Spaces

1 code implementation8 Apr 2022 Edward Gow-Smith, Harish Tayyar Madabushi, Carolina Scarton, Aline Villavicencio

We find that our modified algorithms lead to improved performance on downstream NLP tasks that involve handling complex words, whilst having no detrimental effect on performance in general natural language understanding tasks.

Natural Language Understanding

AStitchInLanguageModels: Dataset and Methods for the Exploration of Idiomaticity in Pre-Trained Language Models

1 code implementation Findings (EMNLP) 2021 Harish Tayyar Madabushi, Edward Gow-Smith, Carolina Scarton, Aline Villavicencio

Despite their success in a variety of NLP tasks, pre-trained language models, due to their heavy reliance on compositionality, fail in effectively capturing the meanings of multiword expressions (MWEs), especially idioms.

Language Modelling

Cannot find the paper you are looking for? You can Submit a new open access paper.