Search Results for author: Alberto Baldrati

Found 14 papers, 12 papers with code

Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion

1 code implementation6 Feb 2025 Marco Mistretta, Alberto Baldrati, Lorenzo Agnolucci, Marco Bertini, Andrew D. Bagdanov

In this paper, we show that the common practice of individually exploiting the text or image encoders of these powerful multi-modal models is highly suboptimal for intra-modal tasks like image-to-image retrieval.

Image Classification Image Retrieval +2

Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation

1 code implementation3 Jul 2024 Marco Mistretta, Alberto Baldrati, Marco Bertini, Andrew D. Bagdanov

Our approach, which we call Knowledge Distillation Prompt Learning (KDPL), can be integrated into existing prompt learning techniques and eliminates the need for labeled examples during adaptation.

Domain Generalization Knowledge Distillation +2

iSEARLE: Improving Textual Inversion for Zero-Shot Composed Image Retrieval

2 code implementations5 May 2024 Lorenzo Agnolucci, Alberto Baldrati, Marco Bertini, Alberto del Bimbo

Given a query consisting of a reference image and a relative caption, Composed Image Retrieval (CIR) aims to retrieve target images visually similar to the reference one while incorporating the changes specified in the relative caption.

Benchmarking Retrieval +1

Multimodal-Conditioned Latent Diffusion Models for Fashion Image Editing

2 code implementations21 Mar 2024 Alberto Baldrati, Davide Morelli, Marcella Cornia, Marco Bertini, Rita Cucchiara

Fashion illustration is a crucial medium for designers to convey their creative vision and transform design concepts into tangible representations that showcase the interplay between clothing and the human body.

Denoising Virtual Try-on

Mapping Memes to Words for Multimodal Hateful Meme Classification

1 code implementation12 Oct 2023 Giovanni Burbi, Alberto Baldrati, Lorenzo Agnolucci, Marco Bertini, Alberto del Bimbo

Multimodal image-text memes are prevalent on the internet, serving as a unique form of communication that combines visual and textual elements to convey humor, ideas, or emotions.

Hateful Meme Classification Language Modeling +1

Exploiting CLIP-based Multi-modal Approach for Artwork Classification and Retrieval

no code implementations21 Sep 2023 Alberto Baldrati, Marco Bertini, Tiberio Uricchio, Alberto del Bimbo

Given the recent advances in multimodal image pretraining where visual models trained with semantically dense textual supervision tend to have better generalization capabilities than those trained using categorical attributes or through unsupervised techniques, in this work we investigate how recent CLIP model can be applied in several tasks in artwork domain.

Retrieval Zero-Shot Learning

OpenFashionCLIP: Vision-and-Language Contrastive Learning with Open-Source Fashion Data

3 code implementations11 Sep 2023 Giuseppe Cartella, Alberto Baldrati, Davide Morelli, Marcella Cornia, Marco Bertini, Rita Cucchiara

The inexorable growth of online shopping and e-commerce demands scalable and robust machine learning-based solutions to accommodate customer requirements.

Contrastive Learning Domain Generalization +2

Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features

2 code implementations22 Aug 2023 Alberto Baldrati, Marco Bertini, Tiberio Uricchio, Alberto del Bimbo

Given a query composed of a reference image and a relative caption, the Composed Image Retrieval goal is to retrieve images visually similar to the reference one that integrates the modifications expressed by the caption.

Contrastive Learning Image Retrieval +1

ECO: Ensembling Context Optimization for Vision-Language Models

no code implementations26 Jul 2023 Lorenzo Agnolucci, Alberto Baldrati, Francesco Todino, Federico Becattini, Marco Bertini, Alberto del Bimbo

Among these, the CLIP model has shown remarkable capabilities for zero-shot transfer by matching an image and a custom textual prompt in its latent space.

Classification Image Classification

LaDI-VTON: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On

1 code implementation22 May 2023 Davide Morelli, Alberto Baldrati, Giuseppe Cartella, Marcella Cornia, Marco Bertini, Rita Cucchiara

In this context, image-based virtual try-on, which consists in generating a novel image of a target model wearing a given in-shop garment, has yet to capitalize on the potential of these powerful generative solutions.

Virtual Try-on

Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing

1 code implementation ICCV 2023 Alberto Baldrati, Davide Morelli, Giuseppe Cartella, Marcella Cornia, Marco Bertini, Rita Cucchiara

Given the lack of existing datasets suitable for the task, we also extend two existing fashion datasets, namely Dress Code and VITON-HD, with multimodal annotations collected in a semi-automatic manner.

Multimodal fashion image editing

Zero-Shot Composed Image Retrieval with Textual Inversion

2 code implementations ICCV 2023 Alberto Baldrati, Lorenzo Agnolucci, Marco Bertini, Alberto del Bimbo

Composed Image Retrieval (CIR) aims to retrieve a target image based on a query composed of a reference image and a relative caption that describes the difference between the two images.

Retrieval Zero-Shot Composed Image Retrieval (ZS-CIR)

Cannot find the paper you are looking for? You can Submit a new open access paper.