no code implementations • NAACL (ALVR) 2021 • Julia Suter, Letitia Parcalabescu, Anette Frank
Phrase grounding (PG) is a multimodal task that grounds language in images.
1 code implementation • 15 Dec 2022 • Letitia Parcalabescu, Anette Frank
But how to quantify the amount of unimodal collapse reliably, at dataset and instance-level, to diagnose and combat unimodal collapse in a targeted way?
1 code implementation • ACL 2022 • Letitia Parcalabescu, Michele Cafagna, Lilitta Muradjan, Anette Frank, Iacer Calixto, Albert Gatt
We propose VALSE (Vision And Language Structured Evaluation), a novel benchmark designed for testing general-purpose pretrained vision and language (V&L) models for their visio-linguistic grounding capabilities on specific linguistic phenomena.
Ranked #1 on
image-sentence alignment
on VALSE
1 code implementation • 9 Dec 2021 • Constantin Eichenberg, Sidney Black, Samuel Weinbach, Letitia Parcalabescu, Anette Frank
Large-scale pretraining is fast becoming the norm in Vision-Language (VL) modeling.
no code implementations • ACL (mmsr, IWCS) 2021 • Letitia Parcalabescu, Nils Trost, Anette Frank
The last years have shown rapid developments in the field of multimodal machine learning, combining e. g., vision, text or speech.
no code implementations • ACL (mmsr, IWCS) 2021 • Letitia Parcalabescu, Albert Gatt, Anette Frank, Iacer Calixto
We investigate the reasoning ability of pretrained vision and language (V&L) models in two tasks that require multimodal integration: (1) discriminating a correct image-sentence pair from an incorrect one, and (2) counting entities in an image.
3 code implementations • 29 Jan 2020 • Juri Opitz, Letitia Parcalabescu, Anette Frank
Different metrics have been proposed to compare Abstract Meaning Representation (AMR) graphs.