Search Results for author: John P. McCrae

Found 40 papers, 9 papers with code

Bilingual Lexicon Induction across Orthographically-distinct Under-Resourced Dravidian Languages

no code implementations VarDial (COLING) 2020 Bharathi Raja Chakravarthi, Navaneethan Rajasekaran, Mihael Arcan, Kevin McGuinness, Noel E. O’Connor, John P. McCrae

Bilingual lexicons are a vital tool for under-resourced languages and recent state-of-the-art approaches to this leverage pretrained monolingual word embeddings using supervised or semi-supervised approaches.

Bilingual Lexicon Induction Word Embeddings

CogALex-VI Shared Task: Bidirectional Transformer based Identification of Semantic Relations

no code implementations COLING (CogALex) 2020 Saurav Karmakar, John P. McCrae

This paper presents a bidirectional transformer based approach for recognising semantic relationships between a pair of words as proposed by CogALex VI shared task in 2020.

Cross-lingual Sentence Embedding using Multi-Task Learning

no code implementations EMNLP 2021 Koustava Goswami, Sourav Dutta, Haytham Assem, Theodorus Fransen, John P. McCrae

We demonstrate the efficacy of an unsupervised as well as a weakly supervised variant of our framework on STS, BUCC and Tatoeba benchmark tasks.

Multi-Task Learning Semantic Similarity +4

Towards a Crowd-Sourced WordNet for Colloquial English

no code implementations GWC 2018 John P. McCrae, Ian Wood, Amanda Hicks

Princeton WordNet is one of the most widely-used resources for natural language processing, but is updated only infrequently and cannot keep up with the fast-changing usage of the English language on social media platforms such as Twitter.

Improving Wordnets for Under-Resourced Languages Using Machine Translation

no code implementations GWC 2018 Bharathi Raja Chakravarthi, Mihael Arcan, John P. McCrae

In addition to that, we carried out a manual evaluation of the translations for the Tamil language, where we demonstrate that our approach can aid in improving wordnet resources for under-resourced Dravidian languages.

Machine Translation Translation

Mapping WordNet Instances to Wikipedia

no code implementations GWC 2018 John P. McCrae

Lexical resource differ from encyclopaedic resources and represent two distinct types of resource covering general language and named entities respectively.

English WordNet 2019 – An Open-Source WordNet for English

1 code implementation GWC 2019 John P. McCrae, Alexandre Rademaker, Francis Bond, Ewa Rudnicka, Christiane Fellbaum

We describe the release of a new wordnet for English based on the Princeton WordNet, but now developed under an open-source model.

ULD-NUIG at Social Media Mining for Health Applications (#SMM4H) Shared Task 2021

no code implementations NAACL (SMM4H) 2021 Atul Kr. Ojha, Priya Rani, Koustava Goswami, Bharathi Raja Chakravarthi, John P. McCrae

Social media platforms such as Twitter and Facebook have been utilised for various research studies, from the cohort-level discussion to community-driven approaches to address the challenges in utilizing social media data for health, clinical and biomedical information.

Named Entity Recognition

Towards a Linking between WordNet and Wikidata

no code implementations EACL (GWC) 2021 John P. McCrae, David Cillessen

WordNet is the most widely used lexical resource for English, while Wikidata is one of the largest knowledge graphs of entity and concepts available.

Knowledge Graphs

The GlobalWordNet Formats: Updates for 2020

1 code implementation EACL (GWC) 2021 John P. McCrae, Michael Wayne Goodman, Francis Bond, Alexandre Rademaker, Ewa Rudnicka, Luis Morgado Da Costa

The Global Wordnet Formats have been introduced to enable wordnets to have a common representation that can be integrated through the Global WordNet Grid.

Unsupervised Deep Language and Dialect Identification for Short Texts

no code implementations COLING 2020 Koustava Goswami, Rajdeep Sarkar, Bharathi Raja Chakravarthi, Theodorus Fransen, John P. McCrae

Automatic Language Identification (LI) or Dialect Identification (DI) of short texts of closely related languages or dialects, is one of the primary steps in many natural language processing pipelines.

Dialect Identification Sentence Embeddings

A Survey of Orthographic Information in Machine Translation

no code implementations4 Aug 2020 Bharathi Raja Chakravarthi, Priya Rani, Mihael Arcan, John P. McCrae

It introduces under-resourced languages in terms of machine translation and how orthographic information can be utilised to improve machine translation.

Bilingual Lexicon Induction Translation

Classification Benchmarks for Under-resourced Bengali Language based on Multichannel Convolutional-LSTM Network

1 code implementation11 Apr 2020 Md. Rezaul Karim, Bharathi Raja Chakravarthi, John P. McCrae, Michael Cochez

Evaluations against several baseline embedding models, e. g., Word2Vec and GloVe yield up to 92. 30%, 82. 25%, and 90. 45% F1-scores in case of document classification, sentiment analysis, and hate speech detection, respectively during 5-fold cross-validation tests.

Classification Document Classification +4

Temporal Analysis of Entity Relatedness and its Evolution using Wikipedia and DBpedia

no code implementations12 Dec 2018 Narumol Prangnawarat, John P. McCrae, Conor Hayes

We then show that integrating multiple time frames in our methods can give a better overall similarity demonstrating that temporal evolution can have an important effect on entity relatedness.

Semantic Similarity Semantic Textual Similarity

Constructing an Annotated Corpus of Verbal MWEs for English

no code implementations COLING 2018 Abigail Walsh, Claire Bonial, Kristina Geeraert, John P. McCrae, Nathan Schneider, Clarissa Somers

This paper describes the construction and annotation of a corpus of verbal MWEs for English, as part of the PARSEME Shared Task 1. 1 on automatic identification of verbal MWEs.

Word Alignment

Cannot find the paper you are looking for? You can Submit a new open access paper.