no code implementations • 28 Mar 2024 • Bo Wan, Michael Tschannen, Yongqin Xian, Filip Pavetic, Ibrahim Alabdulmohsin, Xiao Wang, André Susano Pinto, Andreas Steiner, Lucas Beyer, Xiaohua Zhai
In this paper, we propose a simple visual pretraining method with location-aware captioners (LocCa).
no code implementations • 19 Dec 2023 • Bruno Korbar, Yongqin Xian, Alessio Tonioni, Andrew Zisserman, Federico Tombari
In this paper we present a text-conditioned video resampler (TCR) module that uses a pre-trained and frozen visual encoder and large language model (LLM) to process long video sequences for a task.
Ranked #5 on Video Question Answering on NExT-QA
no code implementations • 14 Dec 2023 • Enis Simsar, Alessio Tonioni, Yongqin Xian, Thomas Hofmann, Federico Tombari
Diffusion models (DMs) have gained prominence due to their ability to generate high-quality, varied images, with recent advancements in text-to-image generation.
no code implementations • 29 Nov 2023 • Sanghwan Kim, Daoji Huang, Yongqin Xian, Otmar Hilliges, Luc van Gool, Xi Wang
Understanding human activity is a crucial yet intricate task in egocentric vision, a field that focuses on capturing visual perspectives from the camera wearer's viewpoint.
no code implementations • 20 Oct 2023 • Muhammad Ferjad Naeem, Yongqin Xian, Xiaohua Zhai, Lukas Hoyer, Luc van Gool, Federico Tombari
However, the contrastive objective used by these models only focuses on image-text alignment and does not incentivise image feature learning for dense prediction tasks.
1 code implementation • 22 Apr 2023 • Qian Wang, Yongqin Xian, Hefei Ling, Jinyuan Zhang, Xiaorui Lin, Ping Li, Jiazhong Chen, Ning Yu
Adversarial attacks aim to disturb the functionality of a target system by adding specific noise to the input samples, bringing potential threats to security and robustness when applied to facial recognition systems.
1 code implementation • 1 Feb 2023 • Saurabh Sharma, Yongqin Xian, Ning Yu, Ambuj Singh
In this work, we show that learning prototype classifiers addresses the biased softmax problem in LTR.
Ranked #8 on Long-tail Learning on CIFAR-100-LT (ρ=10)
1 code implementation • CVPR 2023 • Anurag Das, Yongqin Xian, Dengxin Dai, Bernt Schiele
In this work, we propose a common framework to use different weak labels, e. g. image, point and coarse labels from target domain to reduce this performance gap.
no code implementations • 15 Dec 2022 • Anurag Das, Yongqin Xian, Yang He, Zeynep Akata, Bernt Schiele
For best performance, today's semantic segmentation methods use large and carefully labeled datasets, requiring expensive annotation budgets.
1 code implementation • CVPR 2023 • JieZhang Cao, Qin Wang, Yongqin Xian, Yawei Li, Bingbing Ni, Zhiming Pi, Kai Zhang, Yulun Zhang, Radu Timofte, Luc van Gool
We explicitly design an implicit attention network to learn the ensemble weights for the nearby local features.
no code implementations • CVPR 2023 • Muhammad Ferjad Naeem, Muhammad Gul Zain Ali Khan, Yongqin Xian, Muhammad Zeshan Afzal, Didier Stricker, Luc van Gool, Federico Tombari
Our proposed model, I2MVFormer, learns multi-view semantic embeddings for zero-shot image classification with these class views.
no code implementations • 21 Sep 2022 • Muhammad Ferjad Naeem, Yongqin Xian, Luc van Gool, Federico Tombari
In order to distill discriminative visual words from noisy documents, we introduce a new cross-modal attention module that learns fine-grained interactions between image patches and document words.
no code implementations • 4 Apr 2022 • Wenjia Xu, Yongqin Xian, Jiuniu Wang, Bernt Schiele, Zeynep Akata
While a visual-semantic embedding layer learns global features, local features are learned through an attribute prototype network that simultaneously regresses and decorrelates attributes from intermediate features.
Ranked #5 on GZSL Video Classification on ActivityNet-GZSL(main)
1 code implementation • CVPR 2022 • Wenjia Xu, Yongqin Xian, Jiuniu Wang, Bernt Schiele, Zeynep Akata
Our model visually divides a set of images from seen classes into clusters of local image regions according to their visual similarity, and further imposes their class discrimination and semantic relatedness.
no code implementations • 29 Nov 2021 • Muhammad Ferjad Naeem, Evin Pınar Örnek, Yongqin Xian, Luc van Gool, Federico Tombari
Parts represent a basic unit of geometric and semantic similarity across different objects.
2 code implementations • 3 May 2021 • Massimiliano Mancini, Muhammad Ferjad Naeem, Yongqin Xian, Zeynep Akata
In this work, we overcome this assumption operating on the open world setting, where no limit is imposed on the compositional space at test time, and the search space contains a large number of unseen compositions.
1 code implementation • CVPR 2021 • Yanbei Chen, Yongqin Xian, A. Sophia Koepke, Ying Shan, Zeynep Akata
Having access to multi-modal cues (e. g. vision and audio) empowers some cognitive tasks to be done faster compared to learning from a single modality.
1 code implementation • 21 Apr 2021 • Giuseppe Pastore, Fabio Cermelli, Yongqin Xian, Massimiliano Mancini, Zeynep Akata, Barbara Caputo
Being able to segment unseen classes not observed during training is an important technical challenge in deep learning, because of its potential to reduce the expensive annotation required for semantic segmentation.
1 code implementation • CVPR 2021 • Muhammad Ferjad Naeem, Yongqin Xian, Federico Tombari, Zeynep Akata
In compositional zero-shot learning, the goal is to recognize unseen compositions (e. g. old dog) of observed visual primitives states (e. g. old, cute) and objects (e. g. car, dog) in the training set.
2 code implementations • CVPR 2021 • Massimiliano Mancini, Muhammad Ferjad Naeem, Yongqin Xian, Zeynep Akata
After estimating the feasibility score of each composition, we use these scores to either directly mask the output space or as a margin for the cosine similarity between visual features and compositional embeddings during training.
1 code implementation • 30 Nov 2020 • Fabio Cermelli, Massimiliano Mancini, Yongqin Xian, Zeynep Akata, Barbara Caputo
Semantic segmentation models have two fundamental weaknesses: i) they require large training sets with costly pixel-level annotations, and ii) they have a static output space, constrained to the classes of the training set.
no code implementations • NeurIPS 2020 • Wenjia Xu, Yongqin Xian, Jiuniu Wang, Bernt Schiele, Zeynep Akata
As an additional benefit, our model points to the visual evidence of the attributes in an image, e. g. for the CUB dataset, confirming the improved attribute localization ability of our image representation.
1 code implementation • 9 Jul 2020 • Yongqin Xian, Bruno Korbar, Matthijs Douze, Lorenzo Torresani, Bernt Schiele, Zeynep Akata
Few-shot learning aims to recognize novel classes from a few examples.
no code implementations • 5 Feb 2020 • Yue Fan, Yongqin Xian, Max Maria Losch, Bernt Schiele
In this paper, we are pushing the envelope and aim to further investigate the reliance on spatial information.
1 code implementation • CVPR 2019 • Yongqin Xian, Subhabrata Choudhury, Yang He, Bernt Schiele, Zeynep Akata
In this paper we take this one step further and focus on the challenging task of zero- and few-shot learning of semantic segmentation.
no code implementations • CVPR 2019 • Yongqin Xian, Saurabh Sharma, Bernt Schiele, Zeynep Akata
When labeled training data is scarce, a promising data augmentation approach is to generate visual features of unknown classes using their attributes.
Ranked #3 on Generalized Zero-Shot Learning on SUN Attribute
4 code implementations • CVPR 2018 • Yongqin Xian, Tobias Lorenz, Bernt Schiele, Zeynep Akata
Suffering from the extreme training data imbalance between seen and unseen classes, most of existing state-of-the-art approaches fail to achieve satisfactory results for the challenging generalized zero-shot learning task.
Ranked #5 on Generalized Zero-Shot Learning on SUN Attribute
Generalized Zero-Shot Learning Generative Adversarial Network
9 code implementations • 3 Jul 2017 • Yongqin Xian, Christoph H. Lampert, Bernt Schiele, Zeynep Akata
Due to the importance of zero-shot learning, i. e. classifying images where there is a lack of labeled training data, the number of proposed approaches has recently increased steadily.
1 code implementation • CVPR 2017 • Yongqin Xian, Bernt Schiele, Zeynep Akata
Due to the importance of zero-shot learning, the number of proposed approaches has increased steadily recently.
no code implementations • CVPR 2016 • Yongqin Xian, Zeynep Akata, Gaurav Sharma, Quynh Nguyen, Matthias Hein, Bernt Schiele
We train the model with a ranking based objective function which penalizes incorrect rankings of the true class for a given image.