In this paper, we propose modality-specific distillation (MSD) to effectively transfer knowledge from a teacher on multimodal datasets.
Plus, little is understood about how ER model performance is affected by the choice of ER criteria or by the number/choice of training instances with human rationales.
1 code implementation • 9 May 2022 • Shivam Sharma, Firoj Alam, Md. Shad Akhtar, Dimitar Dimitrov, Giovanni Da San Martino, Hamed Firooz, Alon Halevy, Fabrizio Silvestri, Preslav Nakov, Tanmoy Chakraborty
One interesting finding is that many types of harmful memes are not really studied, e. g., such featuring self-harm and extremism, partly due to the lack of suitable datasets.
Without the use of a knowledge base or candidate sets, our model sets a new state of the art in two benchmark datasets of entity linking: COMETA in the biomedical domain, and AIDA-CoNLL in the news domain.
In this paper, we tackle these issues and study the representation space of self-supervised models by understanding the underlying reasons for misclassifications in a downstream task.
Theoretically, we provide generalization bounds for our approach in terms of the worst-group performance, which scale with respect to both the total number of training points and the number of training points with group labels.
In this paper, we focus on teasing out what parts of the language supervision are essential for training zero-shot image classification models.
An extractive rationale explains a language model's (LM's) prediction on a given task instance by highlighting the text inputs that most influenced the prediction.
We further create and release a new corpus of 950 memes, carefully annotated with 22 propaganda techniques, which can appear in the text, in the image, or in both.
We describe SemEval-2021 task 6 on Detection of Persuasion Techniques in Texts and Images: the data, the annotation guidelines, the evaluation setup, the results, and the participating systems.
Recent years have witnessed the proliferation of fake news, propaganda, misinformation, and disinformation online.
The idea aims at mimicking a teacher's modality-specific predictions by introducing auxiliary loss terms for each modality.
This work examines the vulnerability of multimodal (image + text) models to adversarial threats similar to those discussed in previous literature on unimodal (image- or text-only) models.
This work proposes a new challenge set for multimodal classification, focusing on detecting hate speech in multimodal memes.
Self-supervised bidirectional transformer models such as BERT have led to dramatic improvements in a wide variety of textual classification tasks.