1 code implementation • 27 Mar 2025 • Alessandro Conti, Massimiliano Mancini, Enrico Fini, Yiming Wang, Paolo Rota, Elisa Ricci
Despite this remarkable capability, most existing studies on LMM classification performance are surprisingly limited in scope, often assuming a closed-world setting with a predefined set of categories.
no code implementations • 24 Mar 2025 • Deepayan Das, Davide Talon, Yiming Wang, Massimiliano Mancini, Elisa Ricci
We depart from existing work, and for the first time explore the training-free setting in the context of personalization.
no code implementations • 24 Mar 2025 • Luca Zanella, Massimiliano Mancini, Willi Menapace, Sergey Tulyakov, Yiming Wang, Elisa Ricci
Recent video-language alignment models are trained on sets of videos, each with an associated positive caption and a negative caption generated by large language models.
1 code implementation • 21 Mar 2025 • Davide Berasi, Matteo Farina, Massimiliano Mancini, Elisa Ricci, Nicola Strisciuglio
We demonstrate that visual embeddings of pre-trained VLMs exhibit a compositional arrangement, and evaluate the effectiveness of this property in the tasks of compositional classification and group robustness.
1 code implementation • 14 Mar 2025 • Matteo Farina, Massimiliano Mancini, Giovanni Iacca, Elisa Ricci
An old-school recipe for training a classifier is to (i) learn a good feature extractor and (ii) optimize a linear layer atop.
1 code implementation • 12 Mar 2025 • Thomas De Min, Subhankar Roy, Stéphane Lathuilière, Elisa Ricci, Massimiliano Mancini
MIU minimizes the mutual information between model features and group information, achieving unlearning while reducing performance degradation in the dominant group of the forget set.
1 code implementation • 5 Dec 2024 • Marco Garosi, Riccardo Tedoldi, Davide Boscaini, Massimiliano Mancini, Nicu Sebe, Fabio Poiesi
Supervised 3D part segmentation models are tailored for a fixed set of objects and parts, limiting their transferability to open-set, real-world scenarios.
no code implementations • 4 Nov 2024 • Deepayan Das, Davide Talon, Massimiliano Mancini, Yiming Wang, Elisa Ricci
To mitigate this issue, we introduce a pseudo-rehearsal balancing module that aligns the generated data towards the ground-truth data distribution using either the question meta-statistics or an unsupervised clustering method.
no code implementations • 4 Nov 2024 • Rémi Kazmierczak, Steve Azzolin, Eloïse Berthier, Anna Hedström, Patricia Delhomme, Nicolas Bousquet, Goran Frehse, Massimiliano Mancini, Baptiste Caramiaux, Andrea Passerini, Gianni Franchi
Our first key contribution is a human evaluation of XAI explanations on four diverse datasets (COCO, Pascal Parts, Cats Dogs Cars, and MonumAI) which constitutes the first large-scale benchmark dataset for XAI, with annotations at both the image and concept levels.
1 code implementation • 29 Aug 2024 • Moreno D'Incà, Elia Peruzzo, Massimiliano Mancini, Xingqian Xu, Humphrey Shi, Nicu Sebe
OpenBias detects and quantifies biases, while GradBias determines the contribution of individual prompt words on biases.
no code implementations • 2 Aug 2024 • Simone Caldarella, Massimiliano Mancini, Elisa Ricci, Rahaf Aljundi
Vision-Language Models (VLMs) combine visual and textual understanding, rendering them well-suited for diverse tasks like generating image captions and answering visual questions across various domains.
1 code implementation • 16 Jul 2024 • Thomas De Min, Subhankar Roy, Massimiliano Mancini, Stéphane Lathuilière, Elisa Ricci
To this extent, existing MU approaches assume complete or partial access to the training data, which can be limited over time due to privacy regulations.
1 code implementation • 18 Jun 2024 • Alessandro Conti, Enrico Fini, Paolo Rota, Yiming Wang, Massimiliano Mancini, Elisa Ricci
Finally, the LLM refines the report, presenting the results to the user in natural language.
1 code implementation • 28 May 2024 • Matteo Farina, Gianni Franchi, Giovanni Iacca, Massimiliano Mancini, Elisa Ricci
Thanks to its simplicity and comparatively negligible computation, ZERO can serve as a strong baseline for future work in this field.
1 code implementation • 24 May 2024 • Thomas De Min, Massimiliano Mancini, Stéphane Lathuilière, Subhankar Roy, Elisa Ricci
Since independent pathways in truly incremental scenarios will result in an explosion of computation due to the quadratically complex multi-head self-attention (MSA) operation in prompt tuning, we propose to reduce the original patch token embeddings into summarized tokens.
1 code implementation • 16 Apr 2024 • Alessandro Conti, Enrico Fini, Massimiliano Mancini, Paolo Rota, Yiming Wang, Elisa Ricci
To address VIC, we propose Category Search from External Databases (CaSED), a training-free method that leverages a pre-trained vision-language model and an external database.
1 code implementation • CVPR 2024 • Moreno D'Incà, Elia Peruzzo, Massimiliano Mancini, Dejia Xu, Vidit Goel, Xingqian Xu, Zhangyang Wang, Humphrey Shi, Nicu Sebe
In this paper, we tackle the challenge of open-set bias detection in text-to-image generative models presenting OpenBias, a new pipeline that identifies and quantifies the severity of biases agnostically, without access to any precompiled set.
1 code implementation • CVPR 2024 • Matteo Farina, Massimiliano Mancini, Elia Cunegatti, Gaowen Liu, Giovanni Iacca, Elisa Ricci
In this challenging setting, the transferable representations already encoded in the pretrained model are a key aspect to preserve.
no code implementations • CVPR 2024 • Luca Zanella, Willi Menapace, Massimiliano Mancini, Yiming Wang, Elisa Ricci
Video anomaly detection (VAD) aims to temporally locate abnormal events in a video.
1 code implementation • 13 Oct 2023 • Shyamgopal Karthik, Karsten Roth, Massimiliano Mancini, Zeynep Akata
Finally, we show that CIReVL makes CIR human-understandable by composing image and text in a modular fashion in the language domain, thereby making it intervenable, allowing to post-hoc re-align failure cases.
Ranked #1 on
Zero-Shot Composed Image Retrieval (ZS-CIR)
on GeneCIS
(A-R@1 metric)
1 code implementation • ICCV 2023 • Robert van der Klis, Stephan Alaniz, Massimiliano Mancini, Cassio F. Dantas, Dino Ienco, Zeynep Akata, Diego Marcos
Fine-grained classification often requires recognizing specific object parts, such as beak shape and wing patterns for birds.
1 code implementation • ICCV 2023 • Stephan Alaniz, Massimiliano Mancini, Zeynep Akata
We propose a framework, ISCO, to recompose an object using 3D superquadrics as semantic parts directly from 2D views without training a model that uses 3D supervision.
1 code implementation • ICCV 2023 • Anders Christensen, Massimiliano Mancini, A. Sophia Koepke, Ole Winther, Zeynep Akata
We achieve this with our proposed Image-free Classifier Injection with Semantics (ICIS) that injects classifiers for new, unseen classes into pre-trained classification models in a post-hoc fashion without relying on image data.
1 code implementation • 18 Aug 2023 • Thomas De Min, Massimiliano Mancini, Karteek Alahari, Xavier Alameda-Pineda, Elisa Ricci
State-of-the-art rehearsal-free continual learning methods exploit the peculiarities of Vision Transformers to learn task-specific prompts, drastically reducing catastrophic forgetting.
1 code implementation • ICCV 2023 • Uddeshya Upadhyay, Shyamgopal Karthik, Massimiliano Mancini, Zeynep Akata
We propose ProbVLM, a probabilistic adapter that estimates probability distributions for the embeddings of pre-trained VLMs via inter/intra-modal alignment in a post-hoc manner without needing large-scale datasets or computing.
1 code implementation • NeurIPS 2023 • Alessandro Conti, Enrico Fini, Massimiliano Mancini, Paolo Rota, Yiming Wang, Elisa Ricci
We thus formalize a novel task, termed as Vocabulary-free Image Classification (VIC), where we aim to assign to an input image a class that resides in an unconstrained language-induced semantic space, without the prerequisite of a known vocabulary.
1 code implementation • 22 May 2023 • Shyamgopal Karthik, Karsten Roth, Massimiliano Mancini, Zeynep Akata
Despite their impressive capabilities, diffusion-based text-to-image (T2I) models can lack faithfulness to the text prompt, where generated images may not contain all the mentioned objects, attributes or relations.
1 code implementation • 19 Oct 2022 • Abhra Chaudhuri, Massimiliano Mancini, Yanbei Chen, Zeynep Akata, Anjan Dutta
Representation learning for sketch-based image retrieval has mostly been tackled by learning embeddings that discard modality-specific information.
1 code implementation • 5 Oct 2022 • Abhra Chaudhuri, Massimiliano Mancini, Zeynep Akata, Anjan Dutta
Fine-grained categories that largely share the same set of parts cannot be discriminated based on part information alone, as they mostly differ in the way the local parts relate to the overall global structure of the object.
no code implementations • 24 Aug 2022 • Yanbei Chen, Massimiliano Mancini, Xiatian Zhu, Zeynep Akata
Semi-supervised learning and unsupervised learning offer promising paradigms to learn from an abundance of unlabeled visual data.
1 code implementation • 27 Jul 2022 • Stephan Alaniz, Massimiliano Mancini, Anjan Dutta, Diego Marcos, Zeynep Akata
Toward equipping machines with such capabilities, we propose the Primitive-based Sketch Abstraction task where the goal is to represent sketches using a fixed set of drawing primitives under the influence of a budget.
1 code implementation • 14 Jul 2022 • Uddeshya Upadhyay, Shyamgopal Karthik, Yanbei Chen, Massimiliano Mancini, Zeynep Akata
Moreover, many of the high-performing deep learning models that are already trained and deployed are non-Bayesian in nature and do not provide uncertainty estimates.
1 code implementation • CVPR 2022 • Shyamgopal Karthik, Massimiliano Mancini, Zeynep Akata
The goal of open-world compositional zero-shot learning (OW-CZSL) is to recognize compositions of state and objects in images, given only a subset of them during training and no prior on the unseen compositions.
1 code implementation • 27 Apr 2022 • Ilke Cugu, Massimiliano Mancini, Yanbei Chen, Zeynep Akata
Generalizing visual recognition models trained on a single distribution to unseen input distributions (i. e. domains) requires making them robust to superfluous correlations in the training set.
1 code implementation • 31 Jan 2022 • Fabio Cermelli, Massimiliano Mancini, Samuel Rota Buló, Elisa Ricci, Barbara Caputo
To tackle these issues, we introduce a novel incremental class learning approach for semantic segmentation taking into account a peculiar aspect of this task: since each training step provides annotation only for a subset of all possible classes, pixels of the background class exhibit a semantic shift.
no code implementations • 19 Aug 2021 • Anjan Dutta, Massimiliano Mancini, Zeynep Akata
Existing self-supervised learning methods learn representation by means of pretext tasks which are either (1) discriminating that explicitly specify which features should be separated or (2) aligning that precisely indicate which features should be closed together, but ignore the fact how to jointly and principally define which features to be repelled and which ones to be attracted.
1 code implementation • 9 Jul 2021 • Dario Fontanel, Fabio Cermelli, Massimiliano Mancini, Barbara Caputo
Robotic visual systems operating in the wild must act in unconstrained scenarios, under different environmental conditions while facing a variety of semantic concepts, including unknown ones.
1 code implementation • 1 Jun 2021 • Dario Fontanel, Fabio Cermelli, Massimiliano Mancini, Barbara Caputo
Current state of the art of anomaly segmentation uses generative models, exploiting their incapability to reconstruct patterns unseen during training.
2 code implementations • 3 May 2021 • Massimiliano Mancini, Muhammad Ferjad Naeem, Yongqin Xian, Zeynep Akata
In this work, we overcome this assumption operating on the open world setting, where no limit is imposed on the compositional space at test time, and the search space contains a large number of unseen compositions.
no code implementations • 29 Apr 2021 • Debora Caldarola, Massimiliano Mancini, Fabio Galasso, Marco Ciccone, Emanuele Rodolà, Barbara Caputo
Clustering may reduce heterogeneity by identifying the domains, but it deprives each cluster model of the data and supervision of others.
1 code implementation • 21 Apr 2021 • Giuseppe Pastore, Fabio Cermelli, Yongqin Xian, Massimiliano Mancini, Zeynep Akata, Barbara Caputo
Being able to segment unseen classes not observed during training is an important technical challenge in deep learning, because of its potential to reduce the expensive annotation required for semantic segmentation.
Ranked #8 on
Zero-Shot Semantic Segmentation
on PASCAL VOC
no code implementations • 25 Mar 2021 • Massimiliano Mancini, Lorenzo Porzi, Samuel Rota Bulò, Barbara Caputo, Elisa Ricci
Most deep UDA approaches operate in a single-source, single-target scenario, i. e. they assume that the source and the target samples arise from a single distribution.
no code implementations • 25 Mar 2021 • Massimiliano Mancini, Elisa Ricci, Barbara Caputo, Samuel Rota Buló
In this work, we provide a general formulation of binary mask based models for multi-domain learning by affine transformations of the original network parameters.
2 code implementations • CVPR 2021 • Massimiliano Mancini, Muhammad Ferjad Naeem, Yongqin Xian, Zeynep Akata
After estimating the feasibility score of each composition, we use these scores to either directly mask the output space or as a margin for the cosine similarity between visual features and compositional embeddings during training.
1 code implementation • 16 Dec 2020 • Massimiliano Mancini
In the first part of the thesis, we describe different solutions to enable deep models to generalize to new visual domains, by transferring knowledge from a labeled source domain(s) to a domain (target) where no labeled data are available.
1 code implementation • 30 Nov 2020 • Fabio Cermelli, Massimiliano Mancini, Yongqin Xian, Zeynep Akata, Barbara Caputo
Semantic segmentation models have two fundamental weaknesses: i) they require large training sets with costly pixel-level annotations, and ii) they have a static output space, constrained to the classes of the training set.
no code implementations • 4 Aug 2020 • Levi O. Vasconcelos, Massimiliano Mancini, Davide Boscaini, Samuel Rota Bulo, Barbara Caputo, Elisa Ricci
Recent unsupervised domain adaptation methods based on deep architectures have shown remarkable performance not only in traditional classification tasks but also in more complex problems involving structured predictions (e. g. semantic segmentation, depth estimation).
1 code implementation • ECCV 2020 • Massimiliano Mancini, Zeynep Akata, Elisa Ricci, Barbara Caputo
The key idea of CuMix is to simulate the test-time domain and semantic shift using images and features from unseen domains and categories generated by mixing up the multiple source domains and categories available during training.
no code implementations • 20 Apr 2020 • Dario Fontanel, Fabio Cermelli, Massimiliano Mancini, Samuel Rota Bulò, Elisa Ricci, Barbara Caputo
While convolutional neural networks have brought significant advances in robot vision, their ability is often limited to closed world scenarios, where the number of semantic concepts to be recognized is determined by the available training set.
1 code implementation • CVPR 2020 • Fabio Cermelli, Massimiliano Mancini, Samuel Rota Bulò, Elisa Ricci, Barbara Caputo
Current strategies fail on this task because they do not consider a peculiar aspect of semantic segmentation: since each training step provides annotation only for a subset of all possible classes, pixels of the background class (i. e. pixels that do not belong to any other classes) exhibit a semantic distribution shift.
Ranked #3 on
Domain 11-5
on Cityscapes
no code implementations • 4 Jun 2019 • Massimiliano Mancini, Hakan Karaoguz, Elisa Ricci, Patric Jensfelt, Barbara Caputo
While today's robots are able to perform sophisticated tasks, they can only act on objects they have been trained to recognize.
1 code implementation • 1 Apr 2019 • Fabio Cermelli, Massimiliano Mancini, Elisa Ricci, Barbara Caputo
Deep networks have brought significant advances in robot perception, enabling to improve the capabilities of robots in several visual tasks, ranging from object detection and recognition to pose estimation, semantic scene segmentation and many others.
1 code implementation • CVPR 2019 • Massimiliano Mancini, Samuel Rota Bulò, Barbara Caputo, Elisa Ricci
The ability to categorize is a cornerstone of visual intelligence, and a key functionality for artificial, autonomous visual machines.
no code implementations • 3 Jul 2018 • Massimiliano Mancini, Hakan Karaoguz, Elisa Ricci, Patric Jensfelt, Barbara Caputo
This novel dataset allows for testing the robustness of robot visual recognition algorithms to a series of different domain shifts both in isolation and unified.
no code implementations • 15 Jun 2018 • Massimiliano Mancini, Samuel Rota Bulò, Barbara Caputo, Elisa Ricci
A long standing problem in visual object categorization is the ability of algorithms to generalize across different testing conditions.
Ranked #122 on
Domain Generalization
on PACS
1 code implementation • 30 May 2018 • Massimiliano Mancini, Samuel Rota Bulò, Barbara Caputo, Elisa Ricci
Our method develops from the intuition that, given a set of different classification models associated to known domains (e. g. corresponding to multiple environments, robots), the best model for a new sample in the novel domain can be computed directly at test time by optimally combining the known models.
no code implementations • 28 May 2018 • Massimiliano Mancini, Elisa Ricci, Barbara Caputo, Samuel Rota Bulò
Visual recognition algorithms are required today to exhibit adaptive abilities.
2 code implementations • CVPR 2018 • Massimiliano Mancini, Lorenzo Porzi, Samuel Rota Bulò, Barbara Caputo, Elisa Ricci
Our approach is based on the introduction of two main components, which can be embedded into any existing CNN architecture: (i) a side branch that automatically computes the assignment of a source sample to a latent domain and (ii) novel layers that exploit domain membership information to appropriately align the distribution of the CNN internal feature representations to a reference distribution.
no code implementations • 22 Jul 2017 • Julian Zilly, Amit Boyarski, Micael Carvalho, Amir Atapour Abarghouei, Konstantinos Amplianitis, Aleksandr Krasnov, Massimiliano Mancini, Hernán Gonzalez, Riccardo Spezialetti, Carlos Sampedro Pérez, Hao Li
Reviewing this project with modern eyes provides us with the opportunity to reflect on several issues, relevant now as then to the field of computer vision and research in general, that go beyond the technical aspects of the work.
no code implementations • 25 Feb 2017 • Massimiliano Mancini, Samuel Rota Bulò, Elisa Ricci, Barbara Caputo
This paper presents an approach for semantic place categorization using data obtained from RGB cameras.
no code implementations • CONLL 2017 • Massimiliano Mancini, Jose Camacho-Collados, Ignacio Iacobacci, Roberto Navigli
Word embeddings are widely used in Natural Language Processing, mainly due to their success in capturing semantic information from massive corpora.