Interpreting patient case descriptions has emerged as a challenging problem for biomedical NLP, where the aim is typically to predict diagnoses, to recommended treatments, or to answer questions about cases more generally.
This paper describes the experiments ran for SemEval-2022 Task 2, subtask A, zero-shot and one-shot settings for idiomaticity detection.
This paper presents an overview of Task 4 at SemEval-2022, which was focused on detecting Patronizing and Condescending Language (PCL) towards vulnerable communities.
Despite its relevance, the maturity of NLP for social media pales in comparison with general-purpose models, metrics and benchmarks.
We introduce RAGAs (Retrieval Augmented Generation Assessment), a framework for reference-free evaluation of Retrieval Augmented Generation (RAG) pipelines.
A fundamental challenge in the current NLP context, dominated by language models, comes from the inflexibility of current architectures to 'learn' new information.
1 code implementation • 29 Jun 2022 • Jose Camacho-Collados, Kiamehr Rezaee, Talayeh Riahi, Asahi Ushio, Daniel Loureiro, Dimosthenis Antypas, Joanne Boisson, Luis Espinosa-Anke, Fangyu Liu, Eugenio Martínez-Cámara, Gonzalo Medina, Thomas Buhrmann, Leonardo Neves, Francesco Barbieri
In this paper we present TweetNLP, an integrated platform for Natural Language Processing (NLP) in social media.
Recognizing and categorizing lexical collocations in context is useful for language learning, dictionary compilation and downstream NLP.
In particular, the analysis in centered on Twitter and disinformation for three European languages: English, French and Spanish.
Pre-trained language models such as ClinicalBERT have achieved impressive results on tasks such as medical Natural Language Inference.
While the success of pre-trained language models has largely eliminated the need for high-quality static word vectors in many NLP applications, such vectors continue to play an important role in tasks where words need to be modelled in the absence of linguistic context.
In this paper, we introduce a new annotated dataset which is aimed at supporting the development of NLP models to identify and categorize language that is patronizing or condescending towards vulnerable communities (e. g. refugees, homeless people, poor families).
We present the results of the CAPITEL-EVAL shared task, held in the context of the IberLEF 2020 competition series.
The experimental landscape in natural language processing for social media is too fragmented.
Ranked #3 on Sentiment Analysis on TweetEval
Unfortunately, meaningful regions can be difficult to estimate, especially since we often have few examples of individuals that belong to a given category.
While monolingual word embeddings encode information about words in the context of a particular language, cross-lingual embeddings define a multilingual space where word embeddings from two or more languages are integrated together.
Cross-lingual word embeddings are vector representations of words in different languages where words with similar meaning are represented by similar vectors, regardless of the language.
While word embeddings have been shown to implicitly encode various forms of attributional knowledge, the extent to which they capture relational information is far more limited.
This paper summarizes our contribution to the Hyperpartisan News Detection task in SemEval 2019.
Cross-lingual embeddings represent the meaning of words from different languages in the same vector space.
Human language has evolved towards newer forms of communication such as social media, where emojis (i. e., ideograms bearing a visual meaning) play a key role.
Cross-lingual word embeddings are becoming increasingly important in multilingual NLP.
For example, by examining clusters of relation vectors, we observe that relational similarities can be identified at a more abstract level than with traditional word vector differences.
Today, a massive amount of musical knowledge is stored in written form, with testimonies dated as far back as several centuries ago.
Incorporating linguistic, world and common sense knowledge into AI/NLP systems is currently an important research area, with several open problems and challenges.
This paper describes the SemEval 2018 Shared Task on Hypernym Discovery.
This paper describes the results of the first Shared Task on Multilingual Emoji Prediction, organized as part of SemEval 2018.
Automatically identifying definitional knowledge in text corpora (Definition Extraction or DE) is an important task with direct applications in, among others, Automatic Glossary Generation, Taxonomy Learning, Question Answering and Semantic Search.
Videogame streaming platforms have become a paramount example of noisy user-generated text.
WordNet is probably the best known lexical resource in Natural Language Processing.