Search Results for author: Muhammad Ferjad Naeem

Found 19 papers, 8 papers with code

Toward a Diffusion-Based Generalist for Dense Vision Tasks

no code implementations29 Jun 2024 Yue Fan, Yongqin Xian, Xiaohua Zhai, Alexander Kolesnikov, Muhammad Ferjad Naeem, Bernt Schiele, Federico Tombari

In this paper, we explore diffusion-based vision generalists, where we unify different types of dense prediction tasks as conditional image generation and re-purpose pre-trained diffusion models for it.

Conditional Image Generation Quantization

How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs

no code implementations6 May 2024 Muhammad Uzair Khattak, Muhammad Ferjad Naeem, Jameel Hassan, Muzammal Naseer, Federico Tombari, Fahad Shahbaz Khan, Salman Khan

Recent advancements in Large Language Models (LLMs) have led to the development of Video Large Multi-modal Models (Video-LMMs) that can handle a wide range of video understanding tasks.

Autonomous Vehicles Video Understanding

GiT: Towards Generalist Vision Transformer through Universal Language Interface

1 code implementation14 Mar 2024 Haiyang Wang, Hao Tang, Li Jiang, Shaoshuai Shi, Muhammad Ferjad Naeem, Hongsheng Li, Bernt Schiele, LiWei Wang

Due to its simple design, this paradigm holds promise for narrowing the architectural gap between vision and language.

Ranked #2 on Video Captioning on MSVD-CTN (using extra training data)

Language Modelling Video Captioning

FocusCLIP: Multimodal Subject-Level Guidance for Zero-Shot Transfer in Human-Centric Tasks

no code implementations11 Mar 2024 Muhammad Saif Ullah Khan, Muhammad Ferjad Naeem, Federico Tombari, Luc van Gool, Didier Stricker, Muhammad Zeshan Afzal

We propose FocusCLIP, integrating subject-level guidance--a specialized mechanism for target-specific supervision--into the CLIP framework for improved zero-shot transfer on human-centric tasks.

Activity Recognition Age Classification +1

Learning to Prompt with Text Only Supervision for Vision-Language Models

1 code implementation4 Jan 2024 Muhammad Uzair Khattak, Muhammad Ferjad Naeem, Muzammal Naseer, Luc van Gool, Federico Tombari

While effective, most of these works require labeled data which is not practical, and often struggle to generalize towards new datasets due to over-fitting on the source data.

Prompt Engineering

SemiVL: Semi-Supervised Semantic Segmentation with Vision-Language Guidance

1 code implementation27 Nov 2023 Lukas Hoyer, David Joseph Tan, Muhammad Ferjad Naeem, Luc van Gool, Federico Tombari

In SemiVL, we propose to integrate rich priors from VLM pre-training into semi-supervised semantic segmentation to learn better semantic decision boundaries.

Decoder Segmentation +1

SILC: Improving Vision Language Pretraining with Self-Distillation

no code implementations20 Oct 2023 Muhammad Ferjad Naeem, Yongqin Xian, Xiaohua Zhai, Lukas Hoyer, Luc van Gool, Federico Tombari

However, the contrastive objective used by these models only focuses on image-text alignment and does not incentivise image feature learning for dense prediction tasks.

Classification Contrastive Learning +7

Introducing Language Guidance in Prompt-based Continual Learning

1 code implementation ICCV 2023 Muhammad Gul Zain Ali Khan, Muhammad Ferjad Naeem, Luc van Gool, Didier Stricker, Federico Tombari, Muhammad Zeshan Afzal

While the model faces a disjoint set of classes in each task in this setting, we argue that these classes can be encoded to the same embedding space of a pre-trained language encoder.

Continual Learning

Learning Attention Propagation for Compositional Zero-Shot Learning

no code implementations20 Oct 2022 Muhammad Gul Zain Ali Khan, Muhammad Ferjad Naeem, Luc van Gool, Alain Pagani, Didier Stricker, Muhammad Zeshan Afzal

CAPE learns to identify this structure and propagates knowledge between them to learn class embedding for all seen and unseen compositions.

Compositional Zero-Shot Learning

I2DFormer: Learning Image to Document Attention for Zero-Shot Image Classification

no code implementations21 Sep 2022 Muhammad Ferjad Naeem, Yongqin Xian, Luc van Gool, Federico Tombari

In order to distill discriminative visual words from noisy documents, we introduce a new cross-modal attention module that learns fine-grained interactions between image patches and document words.

Generalized Zero-Shot Learning Image Classification +2

Learning Graph Embeddings for Open World Compositional Zero-Shot Learning

2 code implementations3 May 2021 Massimiliano Mancini, Muhammad Ferjad Naeem, Yongqin Xian, Zeynep Akata

In this work, we overcome this assumption operating on the open world setting, where no limit is imposed on the compositional space at test time, and the search space contains a large number of unseen compositions.

Compositional Zero-Shot Learning

Learning Graph Embeddings for Compositional Zero-shot Learning

1 code implementation CVPR 2021 Muhammad Ferjad Naeem, Yongqin Xian, Federico Tombari, Zeynep Akata

In compositional zero-shot learning, the goal is to recognize unseen compositions (e. g. old dog) of observed visual primitives states (e. g. old, cute) and objects (e. g. car, dog) in the training set.

Compositional Zero-Shot Learning Graph Embedding +1

Open World Compositional Zero-Shot Learning

2 code implementations CVPR 2021 Massimiliano Mancini, Muhammad Ferjad Naeem, Yongqin Xian, Zeynep Akata

After estimating the feasibility score of each composition, we use these scores to either directly mask the output space or as a margin for the cosine similarity between visual features and compositional embeddings during training.

Compositional Zero-Shot Learning

Data Augmentation with Manifold Exploring Geometric Transformations for Increased Performance and Robustness

no code implementations14 Jan 2019 Magdalini Paschali, Walter Simson, Abhijit Guha Roy, Muhammad Ferjad Naeem, Rüdiger Göbl, Christian Wachinger, Nassir Navab

Compared with traditional augmentation methods, and with images synthesized by Generative Adversarial Networks our method not only achieves state-of-the-art performance but also significantly improves the network's robustness.

Data Augmentation General Classification +2

Cannot find the paper you are looking for? You can Submit a new open access paper.