Search Results for author: Matteo Stefanini

Found 8 papers, 4 papers with code

ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval

1 code implementation • 29 Jul 2022 • Nicola Messina, Matteo Stefanini, Marcella Cornia, Lorenzo Baraldi, Fabrizio Falchi, Giuseppe Amato, Rita Cucchiara

In literature, this task is often used as a pre-training objective to forge architectures able to jointly deal with images and texts.

Ranked #22 on Cross-Modal Retrieval on COCO 2014

Image-text matching Retrieval +1

Paper
Code

CaMEL: Mean Teacher Learning for Image Captioning

1 code implementation • 21 Feb 2022 • Manuele Barraco, Matteo Stefanini, Marcella Cornia, Silvia Cascianelli, Lorenzo Baraldi, Rita Cucchiara

Describing images in natural language is a fundamental step towards the automatic modeling of connections between the visual and textual modalities.

Image Captioning Knowledge Distillation

Paper
Code

From Show to Tell: A Survey on Deep Learning-based Image Captioning

no code implementations • 14 Jul 2021 • Matteo Stefanini, Marcella Cornia, Lorenzo Baraldi, Silvia Cascianelli, Giuseppe Fiameni, Rita Cucchiara

Starting from 2015 the task has generally been addressed with pipelines composed of a visual encoder and a language model for text generation.

Image Captioning Language Modelling +1

Paper
Add Code

Learning to Select: A Fully Attentive Approach for Novel Object Captioning

no code implementations • 2 Jun 2021 • Marco Cagrandi, Marcella Cornia, Matteo Stefanini, Lorenzo Baraldi, Rita Cucchiara

In this paper, we present a novel approach for NOC that learns to select the most relevant objects of an image, regardless of their adherence to the training set, and to constrain the generative process of a language model accordingly.

Image Captioning Language Modelling

Paper
Add Code

A Novel Attention-based Aggregation Function to Combine Vision and Language

no code implementations • 27 Apr 2020 • Matteo Stefanini, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

The joint understanding of vision and language has been recently gaining a lot of attention in both the Computer Vision and Natural Language Processing communities, with the emergence of tasks such as image captioning, image-text matching, and visual question answering.

General Classification Image Captioning +4

Paper
Add Code

Meshed-Memory Transformer for Image Captioning

2 code implementations • CVPR 2020 • Marcella Cornia, Matteo Stefanini, Lorenzo Baraldi, Rita Cucchiara

Transformer-based architectures represent the state of the art in sequence modeling tasks like machine translation and language understanding.

Ranked #2 on Image Captioning on MS COCO

Image Captioning Machine Translation +2

501

Paper
Code

Artpedia

no code implementations • International Conference on Image Analysis and Processing 2019 • Matteo Stefanini, Marcella Cornia, Lorenzo Baraldi, Massimiliano Corsini, and Rita Cucchiara

As vision and language techniques are widely applied to realistic images , there is a growing interest in designing visual-semantic models suitable for more complex and challenging scenarios.

Cross-Modal Retrieval Retrieval

Paper
Add Code

A Deep Learning based approach to VM behavior identification in cloud systems

1 code implementation • 5 Mar 2019 • Matteo Stefanini, Riccardo Lancellotti, Lorenzo Baraldi, Simone Calderara

The experiments compare our proposal with state-of-the-art solutions available in literature, demonstrating that our proposal achieve better performance.

Cloud Computing Clustering +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.