no code implementations • 10 Apr 2025 • Mustafa Shukor, Enrico Fini, Victor Guilherme Turrisi da Costa, Matthieu Cord, Joshua Susskind, Alaaeldin El-Nouby
Building general-purpose models that can effectively perceive the world through multimodal signals has been a long-standing goal.
1 code implementation • 27 Mar 2025 • Alessandro Conti, Massimiliano Mancini, Enrico Fini, Yiming Wang, Paolo Rota, Elisa Ricci
Despite this remarkable capability, most existing studies on LMM classification performance are surprisingly limited in scope, often assuming a closed-world setting with a predefined set of categories.
no code implementations • 19 Feb 2025 • Roman Bachmann, Jesse Allardice, David Mizrahi, Enrico Fini, Oğuzhan Fatih Kar, Elmira Amirloo, Alaaeldin El-Nouby, Amir Zamir, Afshin Dehghan
A key finding is that FlexTok enables next-token prediction to describe images in a coarse-to-fine "visual vocabulary", and that the number of tokens to generate depends on the complexity of the generation task.
1 code implementation • CVPR 2025 • Enrico Fini, Mustafa Shukor, Xiujun Li, Philipp Dufter, Michal Klein, David Haldimann, Sai Aitharaju, Victor Guilherme Turrisi da Costa, Louis Béthune, Zhe Gan, Alexander T Toshev, Marcin Eichner, Moin Nabi, Yinfei Yang, Joshua M. Susskind, Alaaeldin El-Nouby
We introduce a novel method for pre-training of large-scale vision encoders.
Ranked #1 on
Image Classification
on iNaturalist
no code implementations • 1 Nov 2024 • Nicola Dall'Asen, Yiming Wang, Enrico Fini, Elisa Ricci
Low-resource domains, characterized by scarce data and annotations, present significant challenges for language and visual understanding tasks, with the latter much under-explored in the literature.
no code implementations • 10 Oct 2024 • Aryo Lotfi, Enrico Fini, Samy Bengio, Moin Nabi, Emmanuel Abbe
Modern vision models have achieved remarkable success in benchmarks where local features provide critical information about the target.
1 code implementation • 18 Jun 2024 • Alessandro Conti, Enrico Fini, Paolo Rota, Yiming Wang, Massimiliano Mancini, Elisa Ricci
Finally, the LLM refines the report, presenting the results to the user in natural language.
1 code implementation • 16 Apr 2024 • Alessandro Conti, Enrico Fini, Massimiliano Mancini, Paolo Rota, Yiming Wang, Elisa Ricci
To address VIC, we propose Category Search from External Databases (CaSED), a training-free method that leverages a pre-trained vision-language model and an external database.
no code implementations • 4 Oct 2023 • Umberto Cappellazzo, Enrico Fini, Muqiao Yang, Daniele Falavigna, Alessio Brutti, Bhiksha Raj
In this paper, we investigate the problem of learning sequence-to-sequence models for spoken language understanding in a class-incremental learning (CIL) setting and we propose COCONUT, a CIL method that relies on the combination of experience replay and contrastive learning.
1 code implementation • CVPR 2023 • Enrico Fini, Pietro Astolfi, Karteek Alahari, Xavier Alameda-Pineda, Julien Mairal, Moin Nabi, Elisa Ricci
Self-supervised learning models have been shown to learn rich visual representations without requiring human annotations.
1 code implementation • NeurIPS 2023 • Alessandro Conti, Enrico Fini, Massimiliano Mancini, Paolo Rota, Yiming Wang, Elisa Ricci
We thus formalize a novel task, termed as Vocabulary-free Image Classification (VIC), where we aim to assign to an input image a class that resides in an unconstrained language-induced semantic space, without the prerequisite of a known vocabulary.
1 code implementation • 15 May 2023 • Enrico Fini, Pietro Astolfi, Adriana Romero-Soriano, Jakob Verbeek, Michal Drozdzal
Indeed, we find that a simple CLIP baseline can also be improved substantially, up to a 25% relative improvement on downstream zero-shot tasks, by using well-known training techniques that are popular in other subfields.
no code implementations • 18 Feb 2023 • Shirsha Bose, Ankit Jha, Enrico Fini, Mainak Singha, Elisa Ricci, Biplab Banerjee
Our method focuses on a domain-agnostic prompt learning strategy, aiming to disentangle the visual style and content information embedded in CLIP's pre-trained vision encoder, enabling effortless adaptation to novel domains during inference.
2 code implementations • ICCV 2023 • Zhiqi Kang, Enrico Fini, Moin Nabi, Elisa Ricci, Karteek Alahari
Despite significant advances, the performance of state-of-the-art continual learning approaches hinges on the unrealistic scenario of fully labeled data.
1 code implementation • 23 Jul 2022 • Riccardo Franceschini, Enrico Fini, Cigdem Beyan, Alessandro Conti, Federica Arrigoni, Elisa Ricci
Our method, as being based on contrastive loss between pairwise modalities, is the first attempt in MER literature.
Cultural Vocal Bursts Intensity Prediction
Multimodal Emotion Recognition
1 code implementation • 26 Mar 2022 • Guanglei Yang, Enrico Fini, Dan Xu, Paolo Rota, Mingli Ding, Moin Nabi, Xavier Alameda-Pineda, Elisa Ricci
This problem has been widely investigated in the research community and several Incremental Learning (IL) approaches have been proposed in the past years.
1 code implementation • 1 Feb 2022 • Guanglei Yang, Enrico Fini, Dan Xu, Paolo Rota, Mingli Ding, Hao Tang, Xavier Alameda-Pineda, Elisa Ricci
To fill this gap, in this paper we introduce a novel attentive feature distillation approach to mitigate catastrophic forgetting while accounting for semantic spatial- and channel-level dependencies.
1 code implementation • CVPR 2022 • Enrico Fini, Victor G. Turrisi da Costa, Xavier Alameda-Pineda, Elisa Ricci, Karteek Alahari, Julien Mairal
Self-supervised models have been shown to produce comparable or better visual representations than their supervised counterparts when trained offline on unlabeled data at scale.
1 code implementation • ICCV 2021 • Enrico Fini, Enver Sangineto, Stéphane Lathuilière, Zhun Zhong, Moin Nabi, Elisa Ricci
In this paper, we study the problem of Novel Class Discovery (NCD).
Ranked #3 on
Novel Object Detection
on LVIS v1.0 val
4 code implementations • 3 Aug 2021 • Victor G. Turrisi da Costa, Enrico Fini, Moin Nabi, Nicu Sebe, Elisa Ricci
This paper presents solo-learn, a library of self-supervised methods for visual representation learning.
1 code implementation • CVPR 2021 • Zhun Zhong, Enrico Fini, Subhankar Roy, Zhiming Luo, Elisa Ricci, Nicu Sebe
In this paper, we address Novel Class Discovery (NCD), the task of unveiling new classes in a set of unlabeled samples given a labeled dataset with known classes.
1 code implementation • ECCV 2020 • Enrico Fini, Stéphane Lathuilière, Enver Sangineto, Moin Nabi, Elisa Ricci
Continual Learning (CL) aims to develop agents emulating the human ability to sequentially learn new tasks while being able to retain knowledge obtained from past experiences.
1 code implementation • 4 Nov 2019 • Enrico Fini, Alessio Brutti
Recently, a fully supervised speaker diarization approach was proposed (UIS-RNN) which models speakers using multiple instances of a parameter-sharing recurrent neural network.
Ranked #1 on
Speaker Diarization
on DIHARD II