no code implementations • EMNLP (MRL) 2021 • Kiamehr Rezaee, Daniel Loureiro, Jose Camacho-Collados, Mohammad Taher Pilehvar
In this paper we analyze the extent to which contextualized sense embeddings, i. e., sense embeddings that are computed based on contextualized word embeddings, are transferable across languages. To this end, we compiled a unified cross-lingual benchmark for Word Sense Disambiguation.
no code implementations • COLING (CogALex) 2020 • Mireia Roig Mirapeix, Luis Espinosa Anke, Jose Camacho-Collados
Textual definitions constitute a fundamental source of knowledge when seeking the meaning of words, and they are the cornerstone of lexical resources like glossaries, dictionaries, encyclopedia or thesauri.
no code implementations • SemEval (NAACL) 2022 • Joanne Boisson, Jose Camacho-Collados, Luis Espinosa-Anke
This paper describes the experiments ran for SemEval-2022 Task 2, subtask A, zero-shot and one-shot settings for idiomaticity detection.
1 code implementation • EMNLP 2021 • Asahi Ushio, Jose Camacho-Collados, Steven Schockaert
Among others, this makes it possible to distill high-quality word vectors from pre-trained language models.
no code implementations • 26 Mar 2024 • Aleksandra Edwards, Jose Camacho-Collados
This makes them suitable for addressing text classification problems for domains with limited amounts of annotated instances.
no code implementations • 1 Nov 2023 • Joanne Boisson, Luis Espinosa-Anke, Jose Camacho-Collados
Metaphor identification aims at understanding whether a given expression is used figuratively in context.
no code implementations • 23 Oct 2023 • Dimosthenis Antypas, Asahi Ushio, Francesco Barbieri, Leonardo Neves, Kiamehr Rezaee, Luis Espinosa-Anke, Jiaxin Pei, Jose Camacho-Collados
Despite its relevance, the maturity of NLP for social media pales in comparison with general-purpose models, metrics and benchmarks.
no code implementations • 19 Oct 2023 • Yi Zhou, Jose Camacho-Collados, Danushka Bollegala
Various types of social biases have been reported with pretrained Masked Language Models (MLMs) in prior work.
1 code implementation • 30 Sep 2023 • Asahi Ushio, Jose Camacho-Collados, Steven Schockaert
In particular, we show that masked language models such as RoBERTa can be straightforwardly fine-tuned for this purpose, using only a small amount of training data.
1 code implementation • 31 Aug 2023 • Nayeon Lee, Chani Jung, Junho Myung, Jiho Jin, Jose Camacho-Collados, Juho Kim, Alice Oh
To address this, we introduce CREHate, a CRoss-cultural English Hate speech dataset.
no code implementations • 4 Aug 2023 • Daniel Loureiro, Kiamehr Rezaee, Talayeh Riahi, Francesco Barbieri, Leonardo Neves, Luis Espinosa Anke, Jose Camacho-Collados
This paper introduces a large collection of time series data derived from Twitter, postprocessed using word embedding techniques, as well as specialized fine-tuned language models.
no code implementations • 4 Jul 2023 • Dimosthenis Antypas, Jose Camacho-Collados
The automatic detection of hate speech online is an active research area in NLP.
1 code implementation • 27 May 2023 • Asahi Ushio, Fernando Alva-Manchego, Jose Camacho-Collados
Generating questions along with associated answers from a text has applications in several domains, such as creating reading comprehension tests for students, or improving document search by providing auxiliary questions and answers based on the query.
1 code implementation • 26 May 2023 • Asahi Ushio, Fernando Alva-Manchego, Jose Camacho-Collados
This task has a variety of applications, such as data augmentation for question answering (QA) models, information retrieval and education.
1 code implementation • 24 May 2023 • Asahi Ushio, Yi Zhou, Jose Camacho-Collados
Multilingual language model (LM) have become a powerful tool in NLP especially for non-English languages.
1 code implementation • 8 Oct 2022 • Asahi Ushio, Fernando Alva-Manchego, Jose Camacho-Collados
It includes general-purpose datasets such as SQuAD for English, datasets from ten domains and two styles, as well as datasets in eight different languages.
1 code implementation • 7 Oct 2022 • Asahi Ushio, Leonardo Neves, Vitor Silva, Francesco Barbieri, Jose Camacho-Collados
Recent progress in language model pre-training has led to important improvements in Named Entity Recognition (NER).
no code implementations • COLING 2022 • Dimosthenis Antypas, Asahi Ushio, Jose Camacho-Collados, Leonardo Neves, Vítor Silva, Francesco Barbieri
Social media platforms host discussions about a wide variety of topics that arise everyday.
1 code implementation • COLING 2022 • Daniel Loureiro, Aminette D'Souza, Areej Nasser Muhajab, Isabella A. White, Gabriel Wong, Luis Espinosa Anke, Leonardo Neves, Francesco Barbieri, Jose Camacho-Collados
To bridge this gap, we present TempoWiC, a new benchmark especially aimed at accelerating research in social media-based meaning shift.
1 code implementation • EACL 2021 • Asahi Ushio, Jose Camacho-Collados
In this paper, we present T-NER (Transformer-based Named Entity Recognition), a Python library for NER LM finetuning.
Ranked #4 on Named Entity Recognition (NER) on WNUT 2017
1 code implementation • 29 Jun 2022 • Jose Camacho-Collados, Kiamehr Rezaee, Talayeh Riahi, Asahi Ushio, Daniel Loureiro, Dimosthenis Antypas, Joanne Boisson, Luis Espinosa-Anke, Fangyu Liu, Eugenio Martínez-Cámara, Gonzalo Medina, Thomas Buhrmann, Leonardo Neves, Francesco Barbieri
In this paper we present TweetNLP, an integrated platform for Natural Language Processing (NLP) in social media.
1 code implementation • *SEM (NAACL) 2022 • Mark Anderson, Jose Camacho-Collados
The increase in performance in NLP due to the prevalence of distributional models and deep learning has brought with it a reciprocal decrease in interpretability.
2 code implementations • ACL 2022 • Daniel Loureiro, Francesco Barbieri, Leonardo Neves, Luis Espinosa Anke, Jose Camacho-Collados
Despite its importance, the time variable has been largely neglected in the NLP and language model literature.
1 code implementation • 1 Feb 2022 • Dimosthenis Antypas, Alun Preece, Jose Camacho-Collados
Social media has become extremely influential when it comes to policy making in modern societies, especially in the western world, where platforms such as Twitter allow users to follow politicians, thus making citizens more involved in political discussion.
no code implementations • 17 Nov 2021 • Aleksandra Edwards, Asahi Ushio, Jose Camacho-Collados, Hélène de Ribaupierre, Alun Preece
Data augmentation techniques are widely used for enhancing the performance of machine learning models by tackling class imbalance issues and data sparsity.
1 code implementation • 21 Sep 2021 • Asahi Ushio, Jose Camacho-Collados, Steven Schockaert
Among others, this makes it possible to distill high-quality word vectors from pre-trained language models.
1 code implementation • 6 Aug 2021 • David Tuxworth, Dimosthenis Antypas, Luis Espinosa-Anke, Jose Camacho-Collados, Alun Preece, David Rogers
In particular, the analysis in centered on Twitter and disinformation for three European languages: English, French and Spanish.
no code implementations • ACL 2021 • Dimosthenis Antypas, Jose Camacho-Collados, Alun Preece, David Rogers
Social media is often used by individuals and organisations as a platform to spread misinformation.
1 code implementation • 26 May 2021 • Daniel Loureiro, Alípio Mário Jorge, Jose Camacho-Collados
Prior work has shown that these contextual representations can be used to accurately represent large sense inventories as sense embeddings, to the extent that a distance-based solution to Word Sense Disambiguation (WSD) tasks outperforms models trained specifically for the task.
1 code implementation • ACL 2021 • Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, Jose Camacho-Collados
Analogies play a central role in human commonsense reasoning.
1 code implementation • LREC 2022 • Francesco Barbieri, Luis Espinosa Anke, Jose Camacho-Collados
Language models are ubiquitous in current NLP, and their multilingual capacity has recently attracted considerable attention.
Ranked #2 on Sentiment Analysis on TweetEval
1 code implementation • EMNLP 2021 • Asahi Ushio, Federico Liberatore, Jose Camacho-Collados
Term weighting schemes are widely used in Natural Language Processing and Information Retrieval.
no code implementations • COLING 2020 • Jose Camacho-Collados, Mohammad Taher Pilehvar
Embeddings have been one of the most important topics of interest in NLP for the past decade.
no code implementations • COLING 2020 • Aleksandra Edwards, Jose Camacho-Collados, H{\'e}l{\`e}ne De Ribaupierre, Alun Preece
Pre-trained language models provide the foundations for state-of-the-art performance across a wide range of natural language processing tasks, including text classification.
no code implementations • CONLL 2020 • Hsiao-Yu Chiang, Jose Camacho-Collados, Zachary Pardos
In this paper, we investigate the hypothesis that examples of a lexical relation in a corpus are fundamental to a neural word embedding{'}s ability to complete analogies involving the relation.
no code implementations • 27 Oct 2020 • Aleksandra Edwards, David Rogers, Jose Camacho-Collados, Hélène de Ribaupierre, Alun Preece
The task of text and sentence classification is associated with the need for large amounts of labelled training data.
2 code implementations • Findings of the Association for Computational Linguistics 2020 • Francesco Barbieri, Jose Camacho-Collados, Leonardo Neves, Luis Espinosa-Anke
The experimental landscape in natural language processing for social media is too fragmented.
Ranked #3 on Sentiment Analysis on TweetEval
1 code implementation • EMNLP 2020 • Alessandro Raganato, Tommaso Pasini, Jose Camacho-Collados, Mohammad Taher Pilehvar
The ability to correctly model distinct meanings of a word is crucial for the effectiveness of semantic representation techniques.
1 code implementation • CL (ACL) 2021 • Daniel Loureiro, Kiamehr Rezaee, Mohammad Taher Pilehvar, Jose Camacho-Collados
We also perform an in-depth comparison of the two main language model based WSD strategies, i. e., fine-tuning and feature extraction, finding that the latter approach is more robust with respect to sense bias and it can better exploit limited available training data.
1 code implementation • EACL 2021 • Anna Breit, Artem Revenko, Kiamehr Rezaee, Mohammad Taher Pilehvar, Jose Camacho-Collados
More specifically, we introduce a framework for Target Sense Verification of Words in Context which grounds its uniqueness in the formulation as a binary classification task thus being independent of external sense inventories, and the coverage of various domains.
Ranked #1 on Entity Linking on WiC-TSV (Task 3 Accuracy: all metric)
1 code implementation • EMNLP 2020 • Daniel Loureiro, Jose Camacho-Collados
State-of-the-art methods for Word Sense Disambiguation (WSD) combine two different features: the power of pre-trained language models and a propagation method to extend the coverage of such models.
no code implementations • 3 Dec 2019 • Zied Bouraoui, Jose Camacho-Collados, Luis Espinosa-Anke, Steven Schockaert
Unfortunately, meaningful regions can be difficult to estimate, especially since we often have few examples of individuals that belong to a given category.
no code implementations • 28 Nov 2019 • Zied Bouraoui, Jose Camacho-Collados, Steven Schockaert
Starting from a few seed instances of a given relation, we first use a large text corpus to find sentences that are likely to express this relation.
no code implementations • 16 Oct 2019 • Yerai Doval, Jose Camacho-Collados, Luis Espinosa-Anke, Steven Schockaert
While monolingual word embeddings encode information about words in the context of a particular language, cross-lingual embeddings define a multilingual space where word embeddings from two or more languages are integrated together.
Cross-Lingual Natural Language Inference Cross-Lingual Word Embeddings +3
no code implementations • LREC 2020 • Yerai Doval, Jose Camacho-Collados, Luis Espinosa-Anke, Steven Schockaert
Cross-lingual word embeddings are vector representations of words in different languages where words with similar meaning are represented by similar vectors, regardless of the language.
1 code implementation • ACL 2019 • Jose Camacho-Collados, Luis Espinosa-Anke, Steven Schockaert
While word embeddings have been shown to implicitly encode various forms of attributional knowledge, the extent to which they capture relational information is far more limited.
no code implementations • SEMEVAL 2019 • Carlos Perell{\'o}, David Tom{\'a}s, Alberto Garcia-Garcia, Jose Garcia-Rodriguez, Jose Camacho-Collados
This paper describes the system developed at the University of Alicante (UA) for the SemEval 2019 Task 5: Shared Task on Multilingual Detection of Hate.
1 code implementation • 17 May 2019 • Jose Camacho-Collados, Yerai Doval, Eugenio Martínez-Cámara, Luis Espinosa-Anke, Francesco Barbieri, Steven Schockaert
Cross-lingual embeddings represent the meaning of words from different languages in the same vector space.
no code implementations • EMNLP 2018 • Francesco Barbieri, Luis Espinosa-Anke, Jose Camacho-Collados, Steven Schockaert, Horacio Saggion
Human language has evolved towards newer forms of communication such as social media, where emojis (i. e., ideograms bearing a visual meaning) play a key role.
no code implementations • NAACL 2019 • Mohammad Taher Pilehvar, Jose Camacho-Collados
By design, word embeddings are unable to model the dynamic nature of words' semantics, i. e., the property of words to correspond to potentially different meanings.
Ranked #14 on Word Sense Disambiguation on Words in Context
1 code implementation • EMNLP 2018 • Yerai Doval, Jose Camacho-Collados, Luis Espinosa-Anke, Steven Schockaert
Cross-lingual word embeddings are becoming increasingly important in multilingual NLP.
1 code implementation • NAACL 2018 • Jose Camacho-Collados, Luis Espinosa-Anke, Mohammad Taher Pilehvar
Incorporating linguistic, world and common sense knowledge into AI/NLP systems is currently an important research area, with several open problems and challenges.
no code implementations • SEMEVAL 2018 • Jose Camacho-Collados, Claudio Delli Bovi, Luis Espinosa-Anke, Sergio Oramas, Tommaso Pasini, Enrico Santus, Vered Shwartz, Roberto Navigli, Horacio Saggion
This paper describes the SemEval 2018 Shared Task on Hypernym Discovery.
no code implementations • SEMEVAL 2018 • Francesco Barbieri, Jose Camacho-Collados, Francesco Ronzano, Luis Espinosa-Anke, Miguel Ballesteros, Valerio Basile, Viviana Patti, Horacio Saggion
This paper describes the results of the first Shared Task on Multilingual Emoji Prediction, organized as part of SemEval 2018.
1 code implementation • SEMEVAL 2018 • Francesco Barbieri, Jose Camacho-Collados
Our analyses reveal that some stereotypes related to the skin color and gender seem to be reflected on the use of these modifiers.
no code implementations • 10 May 2018 • Jose Camacho-Collados, Mohammad Taher Pilehvar
Over the past years, distributed semantic representations have proved to be effective and flexible keepers of prior knowledge to be integrated into downstream applications.
no code implementations • LREC 2020 • Tommaso Pasini, Jose Camacho-Collados
Large sense-annotated datasets are increasingly necessary for training deep supervised systems in Word Sense Disambiguation.
1 code implementation • ACL 2017 • Mohammad Taher Pilehvar, Jose Camacho-Collados, Roberto Navigli, Nigel Collier
Lexical ambiguity can impede NLP systems from accurate understanding of semantics.
no code implementations • SEMEVAL 2017 • Jose Camacho-Collados, Mohammad Taher Pilehvar, Nigel Collier, Roberto Navigli
This paper introduces a new task on Multilingual and Cross-lingual SemanticThis paper introduces a new task on Multilingual and Cross-lingual Semantic Word Similarity which measures the semantic similarity of word pairs within and across five languages: English, Farsi, German, Italian and Spanish.
3 code implementations • WS 2018 • Jose Camacho-Collados, Mohammad Taher Pilehvar
In this paper we investigate the impact of simple text preprocessing decisions (particularly tokenizing, lemmatizing, lowercasing and multiword grouping) on the performance of a standard neural text classifier.
Ranked #10 on Text Classification on Ohsumed
no code implementations • ACL 2017 • Claudio Delli Bovi, Jose Camacho-Collados, Aless Raganato, ro, Roberto Navigli
Parallel corpora are widely used in a variety of Natural Language Processing tasks, from Machine Translation to cross-lingual Word Sense Disambiguation, where parallel sentences can be exploited to automatically generate high-quality sense annotations on a large scale.
no code implementations • EACL 2017 • Aless Raganato, ro, Jose Camacho-Collados, Roberto Navigli
In this paper we develop a unified evaluation framework and analyze the performance of various Word Sense Disambiguation systems in a fair setup.
Ranked #4 on Word Sense Disambiguation on Knowledge-based:
no code implementations • EACL 2017 • Jose Camacho-Collados, Roberto Navigli
In this paper we present BabelDomains, a unified resource which provides lexical items with information about domains of knowledge.
no code implementations • 12 Mar 2017 • Jose Camacho-Collados
The study of taxonomies and hypernymy relations has been extensive on the Natural Language Processing (NLP) literature.
no code implementations • CONLL 2017 • Massimiliano Mancini, Jose Camacho-Collados, Ignacio Iacobacci, Roberto Navigli
Word embeddings are widely used in Natural Language Processing, mainly due to their success in capturing semantic information from massive corpora.
no code implementations • WS 2016 • Aless Raganato, ro, Jose Camacho-Collados, Antonio Raganato, Yunseo Joung
The increasing amount of multilingual text collections available in different domains makes its automatic processing essential for the development of a given field.
no code implementations • COLING 2016 • Luis Espinosa-Anke, Jose Camacho-Collados, Sara Rodr{\'\i}guez-Fern{\'a}ndez, Horacio Saggion, Leo Wanner
WordNet is probably the best known lexical resource in Natural Language Processing.