Search Results for author: Sara Sarto

Found 6 papers, 2 papers with code

Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs

no code implementations • 23 Apr 2024 • Davide Caffagni, Federico Cocchi, Nicholas Moratelli, Sara Sarto, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

Multimodal LLMs are the natural evolution of LLMs, and enlarge their capabilities so as to work beyond the pure textual modality.

Question Answering Retrieval +1

Paper
Add Code

The (R)Evolution of Multimodal Large Language Models: A Survey

no code implementations • 19 Feb 2024 • Davide Caffagni, Federico Cocchi, Luca Barsellotti, Nicholas Moratelli, Sara Sarto, Lorenzo Baraldi, Marcella Cornia, Rita Cucchiara

Connecting text and visual modalities plays an essential role in generative intelligence.

Image Generation Instruction Following +1

Paper
Add Code

With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning

1 code implementation • ICCV 2023 • Manuele Barraco, Sara Sarto, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

Image captioning, like many tasks involving vision and language, currently relies on Transformer-based architectures for extracting the semantics in an image and translating it into linguistically coherent descriptions.

Image Captioning

Paper
Code

Multi-Class Explainable Unlearning for Image Classification via Weight Filtering

no code implementations • 4 Apr 2023 • Samuele Poppi, Sara Sarto, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

Machine Unlearning has recently been emerging as a paradigm for selectively removing the impact of training datapoints from a network.

Classification Image Classification +1

Paper
Add Code

Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation

1 code implementation • CVPR 2023 • Sara Sarto, Manuele Barraco, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

The CLIP model has been recently proven to be very effective for a variety of cross-modal tasks, including the evaluation of captions generated from vision-and-language architectures.

Contrastive Learning Image Captioning +1

Paper
Code

Retrieval-Augmented Transformer for Image Captioning

no code implementations • 26 Jul 2022 • Sara Sarto, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

In this paper, we investigate the development of an image captioning approach with a kNN memory, with which knowledge can be retrieved from an external corpus to aid the generation process.

Image Captioning Retrieval

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.