IRFL: Image Recognition of Figurative Language

Introduced by Yosef et al. in IRFL: Image Recognition of Figurative Language

The IRFL dataset consists of idioms, similes, and metaphors with matching figurative and literal images, as well as two novel tasks of multimodal figurative understanding and preference.

We collected figurative and literal images for textual idioms, metaphors, and similes using an automatic pipeline we created (idioms) and manually (metaphors + similes). We annotated the relations between these images and the figurative phrase they originated from. Using these images we created two novel tasks of figurative understanding and preference.

The figurative understanding task evaluates Vision and Language Pre-Trained Models’ (VL-PTMs) ability to understand the relation between an image and a figurative phrase. The task is to choose the image that best visualizes the figurative phrase out of X candidates. The preference task examines VL-PTMs' preference for figurative images. In this task, the model needs to classify phrase images of different categories correctly based on their ranking by the model matching score.

The best models achieve 22%, 30%, and 66% accuracy vs. humans 97%, 99.7%, and 100% on our understanding task for idioms, metaphors, and similes respectively. The best model achieved an F1 score of 61 on the preference task.

Researchers are welcome to evaluate models on this dataset.

Homepage