no code implementations • 14 Mar 2024 • Tingyu Qu, Tinne Tuytelaars, Marie-Francine Moens
Also when fine-tuning a pre-trained multimodal model such as CLIP-BART, we observe smaller but consistent improvements across a range of VL PEFT tasks.
1 code implementation • 16 Aug 2023 • Tingyu Qu, Tinne Tuytelaars, Marie-Francine Moens
News Image Captioning aims to create captions from news articles and images, emphasizing the connection between textual context and visual elements.
1 code implementation • 24 May 2023 • Mingxiao Li, Tingyu Qu, Ruicong Yao, Wei Sun, Marie-Francine Moens
In this work, we conduct a systematic study of exposure bias in DPM and, intriguingly, we find that the exposure bias could be alleviated with a novel sampling method that we propose, without retraining the model.
1 code implementation • 17 Oct 2022 • Tingyu Qu, Tinne Tuytelaars, Marie-Francine Moens
We revisit the weakly supervised cross-modal face-name alignment task; that is, given an image and a caption, we label the faces in the image with the names occurring in the caption.