no code implementations • CONLL 2017 • Massimiliano Mancini, Jose Camacho-Collados, Ignacio Iacobacci, Roberto Navigli
Word embeddings are widely used in Natural Language Processing, mainly due to their success in capturing semantic information from massive corpora.
no code implementations • 25 Feb 2017 • Massimiliano Mancini, Samuel Rota Bulò, Elisa Ricci, Barbara Caputo
This paper presents an approach for semantic place categorization using data obtained from RGB cameras.
no code implementations • 22 Jul 2017 • Julian Zilly, Amit Boyarski, Micael Carvalho, Amir Atapour Abarghouei, Konstantinos Amplianitis, Aleksandr Krasnov, Massimiliano Mancini, Hernán Gonzalez, Riccardo Spezialetti, Carlos Sampedro Pérez, Hao Li
Reviewing this project with modern eyes provides us with the opportunity to reflect on several issues, relevant now as then to the field of computer vision and research in general, that go beyond the technical aspects of the work.
2 code implementations • CVPR 2018 • Massimiliano Mancini, Lorenzo Porzi, Samuel Rota Bulò, Barbara Caputo, Elisa Ricci
Our approach is based on the introduction of two main components, which can be embedded into any existing CNN architecture: (i) a side branch that automatically computes the assignment of a source sample to a latent domain and (ii) novel layers that exploit domain membership information to appropriately align the distribution of the CNN internal feature representations to a reference distribution.
no code implementations • 28 May 2018 • Massimiliano Mancini, Elisa Ricci, Barbara Caputo, Samuel Rota Bulò
Visual recognition algorithms are required today to exhibit adaptive abilities.
1 code implementation • 30 May 2018 • Massimiliano Mancini, Samuel Rota Bulò, Barbara Caputo, Elisa Ricci
Our method develops from the intuition that, given a set of different classification models associated to known domains (e. g. corresponding to multiple environments, robots), the best model for a new sample in the novel domain can be computed directly at test time by optimally combining the known models.
no code implementations • 15 Jun 2018 • Massimiliano Mancini, Samuel Rota Bulò, Barbara Caputo, Elisa Ricci
A long standing problem in visual object categorization is the ability of algorithms to generalize across different testing conditions.
Ranked #111 on Domain Generalization on PACS
no code implementations • 3 Jul 2018 • Massimiliano Mancini, Hakan Karaoguz, Elisa Ricci, Patric Jensfelt, Barbara Caputo
This novel dataset allows for testing the robustness of robot visual recognition algorithms to a series of different domain shifts both in isolation and unified.
1 code implementation • CVPR 2019 • Massimiliano Mancini, Samuel Rota Bulò, Barbara Caputo, Elisa Ricci
The ability to categorize is a cornerstone of visual intelligence, and a key functionality for artificial, autonomous visual machines.
1 code implementation • 1 Apr 2019 • Fabio Cermelli, Massimiliano Mancini, Elisa Ricci, Barbara Caputo
Deep networks have brought significant advances in robot perception, enabling to improve the capabilities of robots in several visual tasks, ranging from object detection and recognition to pose estimation, semantic scene segmentation and many others.
no code implementations • 4 Jun 2019 • Massimiliano Mancini, Hakan Karaoguz, Elisa Ricci, Patric Jensfelt, Barbara Caputo
While today's robots are able to perform sophisticated tasks, they can only act on objects they have been trained to recognize.
1 code implementation • CVPR 2020 • Fabio Cermelli, Massimiliano Mancini, Samuel Rota Bulò, Elisa Ricci, Barbara Caputo
Current strategies fail on this task because they do not consider a peculiar aspect of semantic segmentation: since each training step provides annotation only for a subset of all possible classes, pixels of the background class (i. e. pixels that do not belong to any other classes) exhibit a semantic distribution shift.
Ranked #3 on Domain 11-5 on Cityscapes
no code implementations • 20 Apr 2020 • Dario Fontanel, Fabio Cermelli, Massimiliano Mancini, Samuel Rota Bulò, Elisa Ricci, Barbara Caputo
While convolutional neural networks have brought significant advances in robot vision, their ability is often limited to closed world scenarios, where the number of semantic concepts to be recognized is determined by the available training set.
1 code implementation • ECCV 2020 • Massimiliano Mancini, Zeynep Akata, Elisa Ricci, Barbara Caputo
The key idea of CuMix is to simulate the test-time domain and semantic shift using images and features from unseen domains and categories generated by mixing up the multiple source domains and categories available during training.
no code implementations • 4 Aug 2020 • Levi O. Vasconcelos, Massimiliano Mancini, Davide Boscaini, Samuel Rota Bulo, Barbara Caputo, Elisa Ricci
Recent unsupervised domain adaptation methods based on deep architectures have shown remarkable performance not only in traditional classification tasks but also in more complex problems involving structured predictions (e. g. semantic segmentation, depth estimation).
1 code implementation • 30 Nov 2020 • Fabio Cermelli, Massimiliano Mancini, Yongqin Xian, Zeynep Akata, Barbara Caputo
Semantic segmentation models have two fundamental weaknesses: i) they require large training sets with costly pixel-level annotations, and ii) they have a static output space, constrained to the classes of the training set.
1 code implementation • 16 Dec 2020 • Massimiliano Mancini
In the first part of the thesis, we describe different solutions to enable deep models to generalize to new visual domains, by transferring knowledge from a labeled source domain(s) to a domain (target) where no labeled data are available.
2 code implementations • CVPR 2021 • Massimiliano Mancini, Muhammad Ferjad Naeem, Yongqin Xian, Zeynep Akata
After estimating the feasibility score of each composition, we use these scores to either directly mask the output space or as a margin for the cosine similarity between visual features and compositional embeddings during training.
no code implementations • 25 Mar 2021 • Massimiliano Mancini, Elisa Ricci, Barbara Caputo, Samuel Rota Buló
In this work, we provide a general formulation of binary mask based models for multi-domain learning by affine transformations of the original network parameters.
no code implementations • 25 Mar 2021 • Massimiliano Mancini, Lorenzo Porzi, Samuel Rota Bulò, Barbara Caputo, Elisa Ricci
Most deep UDA approaches operate in a single-source, single-target scenario, i. e. they assume that the source and the target samples arise from a single distribution.
1 code implementation • 21 Apr 2021 • Giuseppe Pastore, Fabio Cermelli, Yongqin Xian, Massimiliano Mancini, Zeynep Akata, Barbara Caputo
Being able to segment unseen classes not observed during training is an important technical challenge in deep learning, because of its potential to reduce the expensive annotation required for semantic segmentation.
no code implementations • 29 Apr 2021 • Debora Caldarola, Massimiliano Mancini, Fabio Galasso, Marco Ciccone, Emanuele Rodolà, Barbara Caputo
Clustering may reduce heterogeneity by identifying the domains, but it deprives each cluster model of the data and supervision of others.
2 code implementations • 3 May 2021 • Massimiliano Mancini, Muhammad Ferjad Naeem, Yongqin Xian, Zeynep Akata
In this work, we overcome this assumption operating on the open world setting, where no limit is imposed on the compositional space at test time, and the search space contains a large number of unseen compositions.
1 code implementation • 1 Jun 2021 • Dario Fontanel, Fabio Cermelli, Massimiliano Mancini, Barbara Caputo
Current state of the art of anomaly segmentation uses generative models, exploiting their incapability to reconstruct patterns unseen during training.
1 code implementation • 9 Jul 2021 • Dario Fontanel, Fabio Cermelli, Massimiliano Mancini, Barbara Caputo
Robotic visual systems operating in the wild must act in unconstrained scenarios, under different environmental conditions while facing a variety of semantic concepts, including unknown ones.
no code implementations • 19 Aug 2021 • Anjan Dutta, Massimiliano Mancini, Zeynep Akata
Existing self-supervised learning methods learn representation by means of pretext tasks which are either (1) discriminating that explicitly specify which features should be separated or (2) aligning that precisely indicate which features should be closed together, but ignore the fact how to jointly and principally define which features to be repelled and which ones to be attracted.
1 code implementation • 31 Jan 2022 • Fabio Cermelli, Massimiliano Mancini, Samuel Rota Buló, Elisa Ricci, Barbara Caputo
To tackle these issues, we introduce a novel incremental class learning approach for semantic segmentation taking into account a peculiar aspect of this task: since each training step provides annotation only for a subset of all possible classes, pixels of the background class exhibit a semantic shift.
1 code implementation • 27 Apr 2022 • Ilke Cugu, Massimiliano Mancini, Yanbei Chen, Zeynep Akata
Generalizing visual recognition models trained on a single distribution to unseen input distributions (i. e. domains) requires making them robust to superfluous correlations in the training set.
1 code implementation • CVPR 2022 • Shyamgopal Karthik, Massimiliano Mancini, Zeynep Akata
The goal of open-world compositional zero-shot learning (OW-CZSL) is to recognize compositions of state and objects in images, given only a subset of them during training and no prior on the unseen compositions.
1 code implementation • 14 Jul 2022 • Uddeshya Upadhyay, Shyamgopal Karthik, Yanbei Chen, Massimiliano Mancini, Zeynep Akata
Moreover, many of the high-performing deep learning models that are already trained and deployed are non-Bayesian in nature and do not provide uncertainty estimates.
1 code implementation • 27 Jul 2022 • Stephan Alaniz, Massimiliano Mancini, Anjan Dutta, Diego Marcos, Zeynep Akata
Toward equipping machines with such capabilities, we propose the Primitive-based Sketch Abstraction task where the goal is to represent sketches using a fixed set of drawing primitives under the influence of a budget.
no code implementations • 24 Aug 2022 • Yanbei Chen, Massimiliano Mancini, Xiatian Zhu, Zeynep Akata
Semi-supervised learning and unsupervised learning offer promising paradigms to learn from an abundance of unlabeled visual data.
1 code implementation • 5 Oct 2022 • Abhra Chaudhuri, Massimiliano Mancini, Zeynep Akata, Anjan Dutta
Fine-grained categories that largely share the same set of parts cannot be discriminated based on part information alone, as they mostly differ in the way the local parts relate to the overall global structure of the object.
1 code implementation • 19 Oct 2022 • Abhra Chaudhuri, Massimiliano Mancini, Yanbei Chen, Zeynep Akata, Anjan Dutta
Representation learning for sketch-based image retrieval has mostly been tackled by learning embeddings that discard modality-specific information.
1 code implementation • 22 May 2023 • Shyamgopal Karthik, Karsten Roth, Massimiliano Mancini, Zeynep Akata
Despite their impressive capabilities, diffusion-based text-to-image (T2I) models can lack faithfulness to the text prompt, where generated images may not contain all the mentioned objects, attributes or relations.
1 code implementation • NeurIPS 2023 • Alessandro Conti, Enrico Fini, Massimiliano Mancini, Paolo Rota, Yiming Wang, Elisa Ricci
We thus formalize a novel task, termed as Vocabulary-free Image Classification (VIC), where we aim to assign to an input image a class that resides in an unconstrained language-induced semantic space, without the prerequisite of a known vocabulary.
1 code implementation • ICCV 2023 • Uddeshya Upadhyay, Shyamgopal Karthik, Massimiliano Mancini, Zeynep Akata
We propose ProbVLM, a probabilistic adapter that estimates probability distributions for the embeddings of pre-trained VLMs via inter/intra-modal alignment in a post-hoc manner without needing large-scale datasets or computing.
1 code implementation • 18 Aug 2023 • Thomas De Min, Massimiliano Mancini, Karteek Alahari, Xavier Alameda-Pineda, Elisa Ricci
State-of-the-art rehearsal-free continual learning methods exploit the peculiarities of Vision Transformers to learn task-specific prompts, drastically reducing catastrophic forgetting.
1 code implementation • ICCV 2023 • Anders Christensen, Massimiliano Mancini, A. Sophia Koepke, Ole Winther, Zeynep Akata
We achieve this with our proposed Image-free Classifier Injection with Semantics (ICIS) that injects classifiers for new, unseen classes into pre-trained classification models in a post-hoc fashion without relying on image data.
1 code implementation • ICCV 2023 • Stephan Alaniz, Massimiliano Mancini, Zeynep Akata
We propose a framework, ISCO, to recompose an object using 3D superquadrics as semantic parts directly from 2D views without training a model that uses 3D supervision.
1 code implementation • ICCV 2023 • Robert van der Klis, Stephan Alaniz, Massimiliano Mancini, Cassio F. Dantas, Dino Ienco, Zeynep Akata, Diego Marcos
Fine-grained classification often requires recognizing specific object parts, such as beak shape and wing patterns for birds.
1 code implementation • 13 Oct 2023 • Shyamgopal Karthik, Karsten Roth, Massimiliano Mancini, Zeynep Akata
Finally, we show that CIReVL makes CIR human-understandable by composing image and text in a modular fashion in the language domain, thereby making it intervenable, allowing to post-hoc re-align failure cases.
Ranked #1 on Zero-Shot Composed Image Retrieval (ZS-CIR) on CIRCO