1 code implementation • Findings (EMNLP) 2021 • Hossein Amirkhani, Mohammad Taher Pilehvar
A common core assumption of these techniques is that the main model handles biased instances similarly to the biased model, in that it will resort to biases whenever available.
no code implementations • EMNLP (BlackboxNLP) 2021 • Mohsen Fayyaz, Ehsan Aghazadeh, Ali Modarressi, Hosein Mohebbi, Mohammad Taher Pilehvar
Most of the recent works on probing representations have focused on BERT, with the presumption that the findings might be similar to the other models.
no code implementations • EACL (WASSA) 2021 • Costanza Conforti, Jakob Berndt, Mohammad Taher Pilehvar, Chryssi Giannitsarou, Flavio Toxvaerd, Nigel Collier
Cross-target generalization is a known problem in stance detection (SD), where systems tend to perform poorly when exposed to targets unseen during training.
no code implementations • EACL (Hackashop) 2021 • Costanza Conforti, Jakob Berndt, Marco Basaldella, Mohammad Taher Pilehvar, Chryssi Giannitsarou, Flavio Toxvaerd, Nigel Collier
Cross-target generalization constitutes an important issue for news Stance Detection (SD).
1 code implementation • NAACL (ACL) 2022 • Romina Etezadi, Mohammad Karrabi, Najmeh Zare, Mohamad Bagher Sajadi, Mohammad Taher Pilehvar
We introduce DadmaTools, an open-source Python Natural Language Processing toolkit for the Persian language.
no code implementations • ACL 2022 • Mohsen Tabasi, Kiamehr Rezaee, Mohammad Taher Pilehvar
As a recent development in few-shot learning, prompt-based techniques have demonstrated promising potential in a variety of natural language processing tasks.
1 code implementation • ACL 2022 • Costanza Conforti, Jakob Berndt, Mohammad Taher Pilehvar, Chryssi Giannitsarou, Flavio Toxvaerd, Nigel Collier
Research in stance detection has so far focused on models which leverage purely textual input.
no code implementations • EMNLP (MRL) 2021 • Kiamehr Rezaee, Daniel Loureiro, Jose Camacho-Collados, Mohammad Taher Pilehvar
In this paper we analyze the extent to which contextualized sense embeddings, i. e., sense embeddings that are computed based on contextualized word embeddings, are transferable across languages. To this end, we compiled a unified cross-lingual benchmark for Word Sense Disambiguation.
1 code implementation • 22 Oct 2023 • Mahdi Zakizadeh, Kaveh Eskandari Miandoab, Mohammad Taher Pilehvar
We use DiFair as a benchmark for a number of widely-used pretained language models and debiasing techniques.
1 code implementation • 5 Jun 2023 • Ali Modarressi, Mohsen Fayyaz, Ehsan Aghazadeh, Yadollah Yaghoobzadeh, Mohammad Taher Pilehvar
An emerging solution for explaining Transformer-based models is to use vector-based analysis on how the representations are formed.
1 code implementation • 6 Feb 2023 • Ali Modarressi, Hossein Amirkhani, Mohammad Taher Pilehvar
A popular workaround is to train a robust model by re-weighting training examples based on a secondary biased model.
no code implementations • 1 Feb 2023 • Mohammad Akbar-Tajari, Sara Rajaee, Mohammad Taher Pilehvar
Parameter-efficient fine-tuning approaches have recently garnered a lot of attention.
no code implementations • 10 Nov 2022 • Mohsen Fayyaz, Ehsan Aghazadeh, Ali Modarressi, Mohammad Taher Pilehvar, Yadollah Yaghoobzadeh, Samira Ebrahimi Kahou
In this work, we employ these two metrics for the first time in NLP.
1 code implementation • 7 Nov 2022 • Sara Rajaee, Yadollah Yaghoobzadeh, Mohammad Taher Pilehvar
It has been shown that NLI models are usually biased with respect to the word-overlap between premise and hypothesis; they take this feature as a primary cue for predicting the entailment label.
1 code implementation • NAACL 2022 • Ali Modarressi, Mohsen Fayyaz, Yadollah Yaghoobzadeh, Mohammad Taher Pilehvar
There has been a growing interest in interpreting the underlying dynamics of Transformers.
1 code implementation • Findings (ACL) 2022 • Houman Mehrafarin, Sara Rajaee, Mohammad Taher Pilehvar
The analysis also reveals that larger training data mainly affects higher layers, and that the extent of this change is a factor of the number of iterations updating the model during fine-tuning rather than the diversity of the training samples.
1 code implementation • ACL 2022 • Ali Modarressi, Hosein Mohebbi, Mohammad Taher Pilehvar
To determine the importance of each token representation, we train a Contribution Predictor for each layer using a gradient-based saliency method.
1 code implementation • Findings (ACL) 2022 • Sara Rajaee, Mohammad Taher Pilehvar
However, we observe no such dimensions in the multilingual BERT.
no code implementations • 13 Sep 2021 • Mohsen Fayyaz, Ehsan Aghazadeh, Ali Modarressi, Hosein Mohebbi, Mohammad Taher Pilehvar
Most of the recent works on probing representations have focused on BERT, with the presumption that the findings might be similar to the other models.
no code implementations • Findings (EMNLP) 2021 • Sara Rajaee, Mohammad Taher Pilehvar
It is widely accepted that fine-tuning pre-trained language models usually brings about performance improvements in downstream tasks.
1 code implementation • 1 Sep 2021 • Hossein Amirkhani, Mohammad Taher Pilehvar
A common core assumption of these techniques is that the main model handles biased instances similarly to the biased model, in that it will resort to biases whenever available.
1 code implementation • Joint Conference on Lexical and Computational Semantics 2021 • Majid Zarharan, Mahsa Ghaderan, Amin Pourdabiri, Zahra Sayedi, Behrouz Minaei-Bidgoli, Sauleh Eetemadi, Mohammad Taher Pilehvar
Training and evaluation of automatic fact extraction and verification techniques require large amounts of annotated data which might not be available for low-resource languages.
1 code implementation • ACL 2021 • Sara Rajaee, Mohammad Taher Pilehvar
Based on this observation, we propose a local cluster-based method to address the degeneration issue in contextual embedding spaces.
1 code implementation • EMNLP 2021 • Hosein Mohebbi, Ali Modarressi, Mohammad Taher Pilehvar
Several studies have been carried out on revealing linguistic features captured by BERT.
no code implementations • COLING 2020 • Jose Camacho-Collados, Mohammad Taher Pilehvar
Embeddings have been one of the most important topics of interest in NLP for the past decade.
no code implementations • SEMEVAL 2020 • Carlos Santos Armendariz, Matthew Purver, Senja Pollak, Nikola Ljube{\v{s}}i{\'c}, Matej Ul{\v{c}}ar, Ivan Vuli{\'c}, Mohammad Taher Pilehvar
This paper presents the Graded Word Similarity in Context (GWSC) task which asked participants to predict the effects of context on human perception of similarity in English, Croatian, Slovene and Finnish.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Costanza Conforti, Jakob Berndt, Mohammad Taher Pilehvar, Chryssi Giannitsarou, Flavio Toxvaerd, Nigel Collier
We present a new challenging news dataset that targets both stance detection (SD) and fine-grained evidence retrieval (ER).
1 code implementation • EMNLP 2020 • Alessandro Raganato, Tommaso Pasini, Jose Camacho-Collados, Mohammad Taher Pilehvar
The ability to correctly model distinct meanings of a word is crucial for the effectiveness of semantic representation techniques.
1 code implementation • CL (ACL) 2021 • Daniel Loureiro, Kiamehr Rezaee, Mohammad Taher Pilehvar, Jose Camacho-Collados
We also perform an in-depth comparison of the two main language model based WSD strategies, i. e., fine-tuning and feature extraction, finding that the latter approach is more robust with respect to sense bias and it can better exploit limited available training data.
2 code implementations • ACL 2020 • Costanza Conforti, Jakob Berndt, Mohammad Taher Pilehvar, Chryssi Giannitsarou, Flavio Toxvaerd, Nigel Collier
We present a new challenging stance detection dataset, called Will-They-Won't-They (WT-WT), which contains 51, 284 tweets in English, making it by far the largest available dataset of the type.
1 code implementation • EACL 2021 • Anna Breit, Artem Revenko, Kiamehr Rezaee, Mohammad Taher Pilehvar, Jose Camacho-Collados
More specifically, we introduce a framework for Target Sense Verification of Words in Context which grounds its uniqueness in the formulation as a binary classification task thus being independent of external sense inventories, and the coverage of various domains.
Ranked #1 on Entity Linking on WiC-TSV (Task 3 Accuracy: all metric)
1 code implementation • WS 2019 • Victor Prokhorov, Ehsan Shareghi, Yingzhen Li, Mohammad Taher Pilehvar, Nigel Collier
While the explicit constraint naturally avoids posterior collapse, we use it to further understand the significance of the KL term in controlling the information transmitted through the VAE channel.
no code implementations • NAACL 2019 • Mohammad Taher Pilehvar
Meaning conflation deficiency is one of the main limiting factors of word representations which, given their widespread use at the core of many NLP systems, can lead to inaccurate semantic understanding of the input text and inevitably hamper the performance.
1 code implementation • NAACL 2019 • Victor Prokhorov, Mohammad Taher Pilehvar, Nigel Collier
We present a novel method for mapping unrestricted text to knowledge graph entities by framing the task as a sequence-to-sequence problem.
no code implementations • 12 Nov 2018 • Victor Prokhorov, Mohammad Taher Pilehvar, Dimitri Kartsaklis, Pietro Lio, Nigel Collier
Word embedding techniques heavily rely on the abundance of training data for individual words.
no code implementations • WS 2018 • Costanza Conforti, Mohammad Taher Pilehvar, Nigel Collier
In this paper, we propose to adapt the four-staged pipeline proposed by Zubiaga et al. (2018) for the Rumor Verification task to the problem of Fake News Detection.
1 code implementation • 29 Oct 2018 • Milan Gritta, Mohammad Taher Pilehvar, Nigel Collier
Empirical methods in geoparsing have thus far lacked a standard evaluation framework describing the task, metrics and data used to compare state-of-the-art systems.
2 code implementations • EMNLP 2018 • Hoang-Quynh Le, Duy-Cat Can, Sinh T. Vu, Thanh Hai Dang, Mohammad Taher Pilehvar, Nigel Collier
Experimental performance on the task of relation classification has generally improved using deep neural network architectures.
no code implementations • EMNLP 2018 • Mohammad Taher Pilehvar, Dimitri Kartsaklis, Victor Prokhorov, Nigel Collier
Rare word representation has recently enjoyed a surge of interest, owing to the crucial role that effective handling of infrequent words can play in accurate semantic understanding.
no code implementations • NAACL 2019 • Mohammad Taher Pilehvar, Jose Camacho-Collados
By design, word embeddings are unable to model the dynamic nature of words' semantics, i. e., the property of words to correspond to potentially different meanings.
Ranked #14 on Word Sense Disambiguation on Words in Context
no code implementations • EMNLP 2018 • Dimitri Kartsaklis, Mohammad Taher Pilehvar, Nigel Collier
Further, the knowledge base space is prepared by collecting random walks from a graph enhanced with textual features, which act as a set of semantic bridges between text and knowledge base entities.
1 code implementation • NAACL 2018 • Jose Camacho-Collados, Luis Espinosa-Anke, Mohammad Taher Pilehvar
Incorporating linguistic, world and common sense knowledge into AI/NLP systems is currently an important research area, with several open problems and challenges.
no code implementations • ACL 2018 • Milan Gritta, Mohammad Taher Pilehvar, Nigel Collier
The purpose of text geolocation is to associate geographic information contained in a document with a set (or sets) of coordinates, either implicitly by using linguistic features and/or explicitly by using geographic metadata combined with heuristics.
no code implementations • 10 May 2018 • Jose Camacho-Collados, Mohammad Taher Pilehvar
Over the past years, distributed semantic representations have proved to be effective and flexible keepers of prior knowledge to be integrated into downstream applications.
1 code implementation • ACL 2017 • Mohammad Taher Pilehvar, Jose Camacho-Collados, Roberto Navigli, Nigel Collier
Lexical ambiguity can impede NLP systems from accurate understanding of semantics.
no code implementations • SEMEVAL 2017 • Jose Camacho-Collados, Mohammad Taher Pilehvar, Nigel Collier, Roberto Navigli
This paper introduces a new task on Multilingual and Cross-lingual SemanticThis paper introduces a new task on Multilingual and Cross-lingual Semantic Word Similarity which measures the semantic similarity of word pairs within and across five languages: English, Farsi, German, Italian and Spanish.
no code implementations • 24 Jul 2017 • Victor Prokhorov, Mohammad Taher Pilehvar, Dimitri Kartsaklis, Pietro Lió, Nigel Collier
We propose a methodology that adapts graph embedding techniques (DeepWalk (Perozzi et al., 2014) and node2vec (Grover and Leskovec, 2016)) as well as cross-lingual vector space mapping approaches (Least Squares and Canonical Correlation Analysis) in order to merge the corpus and ontological sources of lexical knowledge.
3 code implementations • WS 2018 • Jose Camacho-Collados, Mohammad Taher Pilehvar
In this paper we investigate the impact of simple text preprocessing decisions (particularly tokenizing, lemmatizing, lowercasing and multiword grouping) on the performance of a standard neural text classifier.
Ranked #10 on Text Classification on Ohsumed
1 code implementation • ACL 2017 • Milan Gritta, Mohammad Taher Pilehvar, Nut Limsopatham, Nigel Collier
Named entities are frequently used in a metonymic manner.
no code implementations • EACL 2017 • Mohammad Taher Pilehvar, Nigel Collier
We put forward an approach that exploits the knowledge encoded in lexical resources in order to induce representations for words that were not encountered frequently during training.
no code implementations • EACL 2017 • Ivan Vuli{\'c}, Nikola Mrk{\v{s}}i{\'c}, Mohammad Taher Pilehvar
Specialising vector spaces to maximise their content with respect to one key property of vector space models (e. g. semantic similarity vs. relatedness or lexical entailment) while mitigating others has become an active and attractive research topic in representation learning.
1 code implementation • EMNLP 2016 • Mohammad Taher Pilehvar, Nigel Collier
One major deficiency of most semantic representation techniques is that they usually model a word type as a single point in the semantic space, hence conflating all the meanings that the word can have.
no code implementations • 2 Aug 2016 • José Camacho-Collados, Ignacio Iacobacci, Roberto Navigli, Mohammad Taher Pilehvar
Representing the semantics of linguistic items in a machine-interpretable form has been a major goal of Natural Language Processing since its earliest days.