no code implementations • 27 Mar 2025 • Adrian Bulat, Yassine Ouali, Georgios Tzimiropoulos
In this work, we aim to compress the vision tokens of a Large Vision Language Model (LVLM) into a representation that is simultaneously suitable for (a) generative and (b) discriminative tasks, (c) is nearly lossless, and (d) is storage-efficient.
no code implementations • CVPR 2025 • Yassine Ouali, Adrian Bulat, Alexandros Xenos, Anestis Zaganidis, Ioannis Maniadis Metaxas, Brais Martinez, Georgios Tzimiropoulos
Contrastively-trained Vision-Language Models (VLMs) like CLIP have become the de facto approach for discriminative vision-language representation learning.
no code implementations • 19 Aug 2024 • Yassine Ouali, Adrian Bulat, Brais Martinez, Georgios Tzimiropoulos
Despite recent successes, LVLMs or Large Vision Language Models are prone to hallucinating details like objects and their properties or relations, limiting their real-world deployment.
no code implementations • CVPR 2024 • Adrian Bulat, Yassine Ouali, Georgios Tzimiropoulos
Despite noise and caption quality having been acknowledged as important factors impacting vision-language contrastive pre-training, in this paper, we show that the full potential of improving the training process by addressing such issues is yet to be realized.
1 code implementation • ICCV 2023 • Yassine Ouali, Adrian Bulat, Brais Martinez, Georgios Tzimiropoulos
Vision-Language (V-L) models trained with contrastive learning to align the visual and language modalities have been shown to be strong few-shot learners.
1 code implementation • 26 Dec 2020 • Yassine Ouali, Céline Hudelot, Myriam Tami
In this paper, we explore contrastive learning for few-shot classification, in which we propose to use it as an additional auxiliary training objective acting as a data-dependent regularizer to promote more general and transferable features.
1 code implementation • ECCV 2020 • Yassine Ouali, Céline Hudelot, Myriam Tami
In this work, we propose a new unsupervised image segmentation approach based on mutual information maximization between different constructed views of the inputs.
Ranked #5 on
Unsupervised Semantic Segmentation
on COCO-Stuff-3
no code implementations • 25 Jun 2020 • Yassine Ouali, Victor Bouvier, Myriam Tami, Céline Hudelot
Learning Invariant Representations has been successfully applied for reconciling a source and a target domain for Unsupervised Domain Adaptation.
1 code implementation • 9 Jun 2020 • Yassine Ouali, Céline Hudelot, Myriam Tami
Deep neural networks demonstrated their ability to provide remarkable performances on a wide range of supervised learning tasks (e. g., image classification) when trained on extensive collections of labeled data (e. g., ImageNet).
5 code implementations • CVPR 2020 • Yassine Ouali, Céline Hudelot, Myriam Tami
To leverage the unlabeled examples, we enforce a consistency between the main decoder predictions and those of the auxiliary decoders, taking as inputs different perturbed versions of the encoder's output, and consequently, improving the encoder's representations.