1 code implementation • 4 Jun 2024 • Wenyan Li, Jiaang Li, Rita Ramos, Raphael Tang, Desmond Elliott
Recent advances in retrieval-augmented models for image captioning highlight the benefit of retrieving related captions for efficient, lightweight models with strong domain-transfer capabilities.
1 code implementation • 31 May 2023 • Rita Ramos, Bruno Martins, Desmond Elliott
Multilingual image captioning has recently been tackled by training with large-scale machine translated data, which is an expensive, noisy, and time-consuming process.
1 code implementation • 16 Feb 2023 • Rita Ramos, Desmond Elliott, Bruno Martins
The encoder in our model jointly processes the image and retrieved captions using a pretrained V&L BERT, while the decoder attends to the multimodal encoder representations, benefiting from the extra textual evidence from the retrieved captions.
1 code implementation • CVPR 2023 • Rita Ramos, Bruno Martins, Desmond Elliott, Yova Kementchedjhieva
Recent advances in image captioning have focused on scaling the data and model size, substantially increasing the cost of pre-training and finetuning.