Search Results for author: Paul Cook

Found 52 papers, 2 papers with code

Leveraging distributed representations and lexico-syntactic fixedness for token-level prediction of the idiomaticity of English verb-noun combinations

no code implementations ACL 2018 Milton King, Paul Cook

In this paper we propose and evaluate models for classifying VNC usages as idiomatic or literal, based on a variety of approaches to forming distributed representations.

Machine Translation Sentence Embeddings +1

UNBNLP at SemEval-2018 Task 10: Evaluating unsupervised approaches to capturing discriminative attributes

no code implementations SEMEVAL 2018 Milton King, Ali Hakimi Parizi, Paul Cook

In this paper we present three unsupervised models for capturing discriminative attributes based on information from word embeddings, WordNet, and sentence-level word co-occurrence frequency.

Semantic Textual Similarity Sentence +1

Deep Learning Models For Multiword Expression Identification

no code implementations SEMEVAL 2017 Waseem Gharbieh, Virendrakumar Bhavsar, Paul Cook

Multiword expressions (MWEs) are lexical items that can be decomposed into multiple component words, but have properties that are unpredictable with respect to their component words.

Information Retrieval Machine Translation +4

Do Character-Level Neural Network Language Models Capture Knowledge of Multiword Expression Compositionality?

no code implementations COLING 2018 Ali Hakimi Parizi, Paul Cook

In this paper, we propose the first model for multiword expression (MWE) compositionality prediction based on character-level neural network language models.

Machine Translation

Supervised and unsupervised approaches to measuring usage similarity

no code implementations WS 2017 Milton King, Paul Cook

Usage similarity (USim) is an approach to determining word meaning in context that does not rely on a sense inventory.

LEMMA Word Sense Induction

Determining the Multiword Expression Inventory of a Surprise Language

no code implementations COLING 2016 Bahar Salehi, Paul Cook, Timothy Baldwin

Much previous research on multiword expressions (MWEs) has focused on the token- and type-level tasks of MWE identification and extraction, respectively.

Machine Translation

UNBNLP at SemEval-2019 Task 5 and 6: Using Language Models to Detect Hate Speech and Offensive Language

no code implementations SEMEVAL 2019 Ali Hakimi Parizi, Milton King, Paul Cook

In this paper we apply a range of approaches to language modeling {--} including word-level n-gram and neural language models, and character-level neural language models {--} to the problem of detecting hate speech and offensive language.

General Classification Language Modelling +2

Evaluating a Topic Modelling Approach to Measuring Corpus Similarity

no code implementations LREC 2016 Richard Fothergill, Paul Cook, Timothy Baldwin

Web corpora are often constructed automatically, and their contents are therefore often not well understood.

Classifying Out-of-vocabulary Terms in a Domain-Specific Social Media Corpus

no code implementations LREC 2016 SoHyun Park, Afsaneh Fazly, Annie Lee, Br Seibel, on, Wenjie Zi, Paul Cook

We then propose a supervised approach to classify out-of-vocabulary terms according to these categories, drawing on features based on word embeddings, and linguistic knowledge of common properties of out-of-vocabulary terms.

General Classification Word Embeddings

Evaluating Approaches to Personalizing Language Models

no code implementations LREC 2020 Milton King, Paul Cook

In this work, we consider the problem of personalizing language models, that is, building language models that are tailored to the writing style of an individual.

Language Modelling

Evaluating Sub-word Embeddings in Cross-lingual Models

no code implementations LREC 2020 Ali Hakimi Parizi, Paul Cook

This is particularly problematic for low-resource and morphologically-rich languages, which often have relatively high OOV rates.

Bilingual Lexicon Induction Cross-Lingual Word Embeddings +1

Evaluating a Multi-sense Definition Generation Model for Multiple Languages

no code implementations12 Jun 2020 Arman Kabiri, Paul Cook

Most prior work on definition modeling has not accounted for polysemy, or has done so by considering definition modeling for a target word in a given context.

Word Embeddings

Joint Training for Learning Cross-lingual Embeddings with Sub-word Information without Parallel Corpora

no code implementations Joint Conference on Lexical and Computational Semantics 2020 Ali Hakimi Parizi, Paul Cook

In this paper, we propose a novel method for learning cross-lingual word embeddings, that incorporates sub-word information during training, and is able to learn high-quality embeddings from modest amounts of monolingual data and a bilingual lexicon.

Bilingual Lexicon Induction Classification +4

UNBNLP at SemEval-2021 Task 1: Predicting lexical complexity with masked language models and character-level encoders

no code implementations SEMEVAL 2021 Milton King, Ali Hakimi Parizi, Samin Fakharian, Paul Cook

In this paper, we present three supervised systems for English lexical complexity prediction of single and multiword expressions for SemEval-2021 Task 1.

Lexical Complexity Prediction

Contextualized Embeddings Encode Monolingual and Cross-lingual Knowledge of Idiomaticity

1 code implementation ACL (MWE) 2021 Samin Fakharian, Paul Cook

We consider monolingual experiments for English and Russian, and show that the proposed model outperforms previous approaches, including in the case that the model is tested on instances of PIE types that were not observed during training.

Now, It’s Personal : The Need for Personalized Word Sense Disambiguation

no code implementations RANLP 2021 Milton King, Paul Cook

We propose a novel WSD dataset and show that personalizing a WSD system with knowledge of an author’s sense distributions or predominant senses can greatly increase its performance.

LEMMA Word Sense Disambiguation

Leveraging a Bilingual Dictionary to Learn Wolastoqey Word Representations

no code implementations LREC 2022 Diego Bear, Paul Cook

As there exist no large corpora of running text for Wolastoqey, in this paper, we leverage a bilingual dictionary to learn Wolastoqey word embeddings by encoding their corresponding English definitions into vector representations using pretrained English word and sequence representation models.

Information Retrieval Learning Word Embeddings +4

Evaluating Unsupervised Approaches to Morphological Segmentation for Wolastoqey

no code implementations SIGUL (LREC) 2022 Diego Bear, Paul Cook

Finite-state approaches to morphological analysis have been shown to improve the performance of natural language processing systems for polysynthetic languages, in-which words are generally composed of many morphemes, for tasks such as language modelling (Schwartz et al., 2020).

Language Modelling Morphological Analysis

Cannot find the paper you are looking for? You can Submit a new open access paper.