1 code implementation • 3 Feb 2025 • Hashmat Shadab Malik, Fahad Shamshad, Muzammal Naseer, Karthik Nandakumar, Fahad Khan, Salman Khan
Multi-modal Large Language Models (MLLMs) excel in vision-language tasks but remain vulnerable to visual adversarial perturbations that can induce hallucinations, manipulate responses, or bypass safety mechanisms.
1 code implementation • 8 Jan 2025 • Ulindu De Silva, Didula Samaraweera, Sasini Wanigathunga, Kavindu Kariyawasam, Kanchana Ranasinghe, Muzammal Naseer, Ranga Rodrigo
We present Seg-TTO, a novel framework for zero-shot, open-vocabulary semantic segmentation (OVSS), designed to excel in specialized domain tasks.
Open Vocabulary Semantic Segmentation
Open-Vocabulary Semantic Segmentation
+1
1 code implementation • 24 Dec 2024 • Jinhui Yi, Syed Talal Wasim, Yanan Luo, Muzammal Naseer, Juergen Gall
The fine-grained video question-answering evaluation demonstrates our model's effectiveness, outperforming the encoder-based approaches Video-ChatGPT and Video-LLaVA in key aspects like correctness and temporal understanding.
1 code implementation • 13 Dec 2024 • Muhammad Uzair Khattak, Shahina Kunhimon, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan
UniMed is developed using a data-collection framework that leverages Large Language Models (LLMs) to transform modality-specific classification datasets into image-text formats while incorporating existing image-text data from the medical domain, facilitating scalable VLM pretraining.
1 code implementation • 2 Oct 2024 • Umair Nawaz, Muhammad Awais, Hanan Gani, Muzammal Naseer, Fahad Khan, Salman Khan, Rao Muhammad Anwer
Further, this domain desires fine-grained feature learning due to the subtle nature of the downstream tasks (e. g, nutrient deficiency detection, livestock breed classification).
no code implementations • 1 Oct 2024 • Muhammad Hamza Sharif, Muzammal Naseer, Mohammad Yaqub, Min Xu, Mohsen Guizani
However, for voxel-wise prediction tasks, discriminative local features are key components for the performance of the VS models which is missing in attention-based VS methods.
1 code implementation • 24 Sep 2024 • Mubashir Noman, Noor Ahsan, Muzammal Naseer, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Fahad Shahbaz Khan
In order to achieve this, we introduce a change description instruction dataset that can be utilized to finetune an LMM and provide better change descriptions for RS images.
no code implementations • 18 Sep 2024 • Hamid Suleman, Syed Talal Wasim, Muzammal Naseer, Juergen Gall
We demonstrate that the stable and efficient interleaved architecture resolves the scalability issue of Mamba-based architectures for images and videos and increases robustness to common artifacts like JPEG compression.
1 code implementation • 29 Aug 2024 • Noor Hussein, Fahad Shamshad, Muzammal Naseer, Karthik Nandakumar
Moreover, PromptSmooth requires only a single model to handle multiple noise levels, which substantially reduces the computational cost compared to traditional methods that rely on training a separate model for each noise level.
1 code implementation • 29 Aug 2024 • Koushik Srivatsan, Fahad Shamshad, Muzammal Naseer, Karthik Nandakumar
Though many methods have been proposed for erasing undesired concepts from T2IG models, they only provide a false sense of security, as recent works demonstrate that concept-erased models (CEMs) can be easily deceived to generate the erased concept through adversarial attacks.
1 code implementation • 20 Aug 2024 • Fahad Shamshad, Muzammal Naseer, Karthik Nandakumar
To handle these issues, we propose a test-time optimization approach that solely optimizes an untrained neural network to transfer makeup style from a reference to a source image in an adversarial manner.
1 code implementation • 14 Aug 2024 • Asif Hanif, Fahad Shamshad, Muhammad Awais, Muzammal Naseer, Fahad Shahbaz Khan, Karthik Nandakumar, Salman Khan, Rao Muhammad Anwer
Inspired by the latest developments in learnable prompts, this work introduces a method to embed a backdoor into the medical foundation model during the prompt learning phase.
no code implementations • 16 Jul 2024 • Naif Alkhunaizi, Faris Almalik, Rouqaiah Al-Refai, Muzammal Naseer, Karthik Nandakumar
Moreover, the large size of these models necessitates the use of parameter-efficient fine-tuning (PEFT) to reduce the communication burden in federated learning.
2 code implementations • 14 Jun 2024 • Rohit Bharadwaj, Hanan Gani, Muzammal Naseer, Fahad Shahbaz Khan, Salman Khan
Despite their impressive capabilities, current Video-LMMs have not been evaluated for anomaly detection tasks, which is critical to their deployment in practical scenarios e. g., towards identifying deepfakes, manipulated video content, traffic accidents and crimes.
1 code implementation • 13 Jun 2024 • Hashmat Shadab Malik, Fahad Shamshad, Muzammal Naseer, Karthik Nandakumar, Fahad Shahbaz Khan, Salman Khan
To gain a deeper understanding of VSSMs' adversarial robustness, we conduct a frequency-based analysis of adversarial attacks, evaluating their performance against low-frequency and high-frequency perturbations.
1 code implementation • 12 Jun 2024 • Hashmat Shadab Malik, Numan Saeed, Asif Hanif, Muzammal Naseer, Mohammad Yaqub, Salman Khan, Fahad Shahbaz Khan
We extend this investigation across four volumetric segmentation datasets, evaluating robustness under both white box and black box adversarial attacks.
1 code implementation • 7 Jun 2024 • Yuhao Li, Muzammal Naseer, Jiale Cao, Yu Zhu, Jinqiu Sun, Yanning Zhang, Fahad Shahbaz Khan
To this end, we propose a new multi-object tracking framework, named LG-MOT, that explicitly leverages language information at different levels of granularity (scene-and instance-level) and combines it with standard visual features to obtain discriminative representations.
1 code implementation • 28 May 2024 • Amandeep Kumar, Muzammal Naseer, Sanath Narayan, Rao Muhammad Anwer, Salman Khan, Hisham Cholakkal
Moreover, they often result in misaligned image generation for prompt sequences featuring multiple objects.
no code implementations • 6 May 2024 • Muhammad Uzair Khattak, Muhammad Ferjad Naeem, Jameel Hassan, Muzammal Naseer, Federico Tombari, Fahad Shahbaz Khan, Salman Khan
Recent advancements in Large Language Models (LLMs) have led to the development of Video Large Multi-modal Models (Video-LMMs) that can handle a wide range of video understanding tasks.
1 code implementation • 15 Apr 2024 • Amaya Dharmasiri, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan
Thereby we demonstrate that 2D vision language models such as CLIP can be used to complement 3D representation learning to improve classification performance without the need for expensive class annotations.
1 code implementation • 1 Apr 2024 • Shahina Kunhimon, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan
Incorporating text features alongside visual features is a potential solution to enhance the model's understanding of the data, as it goes beyond pixel-level information to provide valuable context.
1 code implementation • CVPR 2024 • Omkar Thawakar, Muzammal Naseer, Rao Muhammad Anwer, Salman Khan, Michael Felsberg, Mubarak Shah, Fahad Shahbaz Khan
Composed video retrieval (CoVR) is a challenging problem in computer vision which has recently highlighted the integration of modification text with visual queries for more sophisticated video search in large databases.
1 code implementation • 21 Mar 2024 • Ahmad Mahmood, Ashmal Vayani, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan
In contrast, this paper introduces a Video Understanding and Reasoning Framework (VURF) based on the reasoning power of LLMs.
1 code implementation • 21 Mar 2024 • Hasindri Watawana, Kanchana Ranasinghe, Tariq Mahmood, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan
Self-supervised representation learning has been highly promising for histopathology image analysis with numerous approaches leveraging their patient-slide-patch hierarchy to learn better representations.
1 code implementation • CVPR 2024 • Mubashir Noman, Muzammal Naseer, Hisham Cholakkal, Rao Muhammad Anwar, Salman Khan, Fahad Shahbaz Khan
Recent advances in unsupervised learning have demonstrated the ability of large vision models to achieve promising results on downstream tasks by pre-training on large amount of unlabelled data.
1 code implementation • 7 Mar 2024 • Hashmat Shadab Malik, Muhammad Huzaifa, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan
We produce various versions of standard vision datasets (ImageNet, COCO), incorporating either diverse and realistic backgrounds into the images or introducing color, texture, and adversarial changes in the background.
1 code implementation • 27 Feb 2024 • Hanan Gani, Muzammal Naseer, Fahad Khan, Salman Khan
The proposed approach induces contextual knowledge in the network by learning to reconstruct the missing organ or parts of an organ in the output segmentation space.
1 code implementation • 4 Jan 2024 • Muhammad Uzair Khattak, Muhammad Ferjad Naeem, Muzammal Naseer, Luc van Gool, Federico Tombari
While effective, most of these works require labeled data which is not practical, and often struggle to generalize towards new datasets due to over-fitting on the source data.
no code implementations • CVPR 2024 • Syed Talal Wasim, Muzammal Naseer, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan
Our contributions include a novel spatio-temporal video grounding model surpassing state-of-the-art results in closed-set evaluations on multiple datasets and demonstrating superior performance in open-vocabulary scenarios.
no code implementations • 31 Dec 2023 • Syed Talal Wasim, Muzammal Naseer, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan
Our contributions include a novel spatio-temporal video grounding model, surpassing state-of-the-art results in closed-set evaluations on multiple datasets and demonstrating superior performance in open-vocabulary scenarios.
1 code implementation • CVPR 2024 • Kartik Kuckreja, Muhammad Sohail Danish, Muzammal Naseer, Abhijit Das, Salman Khan, Fahad Shahbaz Khan
Furthermore, the lack of domain-specific multimodal instruction following data as well as strong backbone models for RS make it hard for the models to align their behavior with user queries.
1 code implementation • 19 Nov 2023 • Rohit Bharadwaj, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan
We present a novel approach to transform existing closed-set detectors into open-set detectors.
Ranked #1 on
Novel Object Detection
on LVIS v1.0 val
no code implementations • NeurIPS 2023 • Jameel Hassan, Hanan Gani, Noor Hussein, Muhammad Uzair Khattak, Muzammal Naseer, Fahad Shahbaz Khan, Salman Khan
The promising zero-shot generalization of vision-language models such as CLIP has led to their adoption using prompt learning for numerous downstream tasks.
no code implementations • 23 Oct 2023 • Adeel Yousaf, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah
Consistent improvements across multiple benchmarks and with various VLMs demonstrate the effectiveness of our proposed framework.
Ranked #4 on
Video-Text Retrieval
on Test-of-Time
1 code implementation • 16 Oct 2023 • Hanan Gani, Shariq Farooq Bhat, Muzammal Naseer, Salman Khan, Peter Wonka
Diffusion-based generative models have significantly advanced text-to-image generation but encounter challenges when processing lengthy and intricate text prompts describing complex scenes with multiple objects.
3 code implementations • ICCV 2023 • Koushik Srivatsan, Muzammal Naseer, Karthik Nandakumar
Specifically, we show that aligning the image representation with an ensemble of class descriptions (based on natural language semantics) improves FAS generalizability in low-data regimes.
1 code implementation • 24 Aug 2023 • Sheng Zhang, Muzammal Naseer, Guangyi Chen, Zhiqiang Shen, Salman Khan, Kun Zhang, Fahad Khan
To address this challenge, we propose the Self Structural Semantic Alignment (S^3A) framework, which extracts the structural semantic information from unlabeled data while simultaneously self-learning.
1 code implementation • 25 Jul 2023 • Muhammad Awais, Muzammal Naseer, Salman Khan, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, Fahad Shahbaz Khan
Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world.
2 code implementations • 14 Jul 2023 • Asif Hanif, Muzammal Naseer, Salman Khan, Mubarak Shah, Fahad Shahbaz Khan
While recent advances in deep learning have improved the performance of volumetric medical image segmentation models, these models cannot be deployed for real-world applications immediately due to their vulnerability to adversarial attacks.
2 code implementations • ICCV 2023 • Muhammad Uzair Khattak, Syed Talal Wasim, Muzammal Naseer, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan
To the best of our knowledge, this is the first regularization framework for prompt learning that avoids overfitting by jointly attending to pre-trained model features, the training trajectory during prompting, and the textual diversity.
Ranked #2 on
Prompt Engineering
on ImageNet-S
2 code implementations • ICCV 2023 • Syed Talal Wasim, Muhammad Uzair Khattak, Muzammal Naseer, Salman Khan, Mubarak Shah, Fahad Shahbaz Khan
Video transformer designs are based on self-attention that can model global context at a high computational cost.
Ranked #2 on
Action Recognition
on Diving-48
1 code implementation • CVPR 2023 • Fahad Shamshad, Muzammal Naseer, Karthik Nandakumar
We propose a novel two-step approach for facial privacy protection that relies on finding adversarial latent codes in the low-dimensional manifold of a pretrained generative model.
1 code implementation • 15 Jun 2023 • Shahina Kunhimon, Abdelrahman Shaker, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan
Hybrid volumetric medical image segmentation models, combining the advantages of local convolution and global attention, have recently received considerable attention.
1 code implementation • CVPR 2023 • Syed Talal Wasim, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah
Through this prompting scheme, we can achieve state-of-the-art zero-shot performance on Kinetics-600, HMDB51 and UCF101 while remaining competitive in the supervised setting.
no code implementations • 23 Feb 2023 • Muzammal Naseer, Ahmad Mahmood, Salman Khan, Fahad Khan
Our temporal prompts are the result of a learnable transformation that allows optimizing for temporal gradients during an adversarial attack to fool the motion dynamics.
no code implementations • 30 Dec 2022 • Muzammal Naseer, Salman Khan, Fatih Porikli, Fahad Shahbaz Khan
Recently, different adversarial training defenses are proposed that not only maintain a high clean accuracy but also show significant robustness against popular and well studied adversarial attacks such as PGD.
1 code implementation • CVPR 2023 • Sheng Zhang, Salman Khan, Zhiqiang Shen, Muzammal Naseer, Guangyi Chen, Fahad Khan
The GNCD setting aims to categorize unlabeled training data coming from known and novel classes by leveraging the information of partially labeled known classes.
2 code implementations • 13 Oct 2022 • Hanan Gani, Muzammal Naseer, Mohammad Yaqub
However, in contrast to convolutional neural networks, Vision Transformer lacks inherent inductive biases.
2 code implementations • 25 Jul 2022 • Maryam Sultana, Muzammal Naseer, Muhammad Haris Khan, Salman Khan, Fahad Shahbaz Khan
Similar to CNNs, ViTs also struggle in out-of-distribution scenarios and the main culprit is overfitting to source domains.
1 code implementation • 18 Jul 2022 • Hashmat Shadab Malik, Shahina K Kunhimon, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan
Our training approach is based on a min-max scheme which reduces overfitting via an adversarial objective and thus optimizes for a more generalizable surrogate model.
1 code implementation • CVPR 2022 • Kanchana Ranasinghe, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan, Michael Ryoo
To the best of our knowledge, the proposed approach is the first to alleviate the dependency on negative samples or dedicated memory banks in Self-supervised Video Transformer (SVT).
3 code implementations • ICLR 2022 • Muzammal Naseer, Kanchana Ranasinghe, Salman Khan, Fahad Shahbaz Khan, Fatih Porikli
(ii) Token Refinement: We then propose to refine the tokens to further enhance the discriminative capacity at each block of ViT.
1 code implementation • NeurIPS 2021 • Muzammal Naseer, Kanchana Ranasinghe, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang
We show and analyze the following intriguing properties of ViT: (a) Transformers are highly robust to severe occlusions, perturbations and domain shifts, e. g., retain as high as 60% top-1 accuracy on ImageNet even after randomly occluding 80% of the image content.
no code implementations • 26 Apr 2021 • Mohamed Afham, Salman Khan, Muhammad Haris Khan, Muzammal Naseer, Fahad Shahbaz Khan
Human learning benefits from multi-modal inputs that often appear as rich semantics (e. g., description of an object's attributes while learning about it).
Ranked #1 on
Few-Shot Image Classification
on Oxford 102 Flower
(using extra training data)
3 code implementations • ICCV 2021 • Muzammal Naseer, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Fatih Porikli
To this end, we propose a new objective function that not only aligns the global distributions of source and target images, but also matches the local neighbourhood structure between the two domains.
1 code implementation • ICCV 2021 • Kanchana Ranasinghe, Muzammal Naseer, Munawar Hayat, Salman Khan, Fahad Shahbaz Khan
The CE loss encourages features of a class to have a higher projection score on the true class-vector compared to the negative classes.
no code implementations • 4 Jan 2021 • Salman Khan, Muzammal Naseer, Munawar Hayat, Syed Waqas Zamir, Fahad Shahbaz Khan, Mubarak Shah
Astounding results from Transformer models on natural language tasks have intrigued the vision community to study their application to computer vision problems.
1 code implementation • 29 Jul 2020 • Muzammal Naseer, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Fatih Porikli
In contrast to existing adversarial training methods that only use class-boundary information (e. g., using a cross-entropy loss), we propose to exploit additional information from the feature space to craft stronger adversaries that are in turn used to learn a robust model.
2 code implementations • CVPR 2020 • Muzammal Naseer, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Fatih Porikli
Adversarial examples can cause catastrophic mistakes in Deep Neural Network (DNNs) based vision systems e. g., for classification, segmentation and object detection.
2 code implementations • NeurIPS 2019 • Muzammal Naseer, Salman H. Khan, Harris Khan, Fahad Shahbaz Khan, Fatih Porikli
To this end, we propose a framework capable of launching highly transferable attacks that crafts adversarial patterns to mislead networks trained on wholly different domains.
1 code implementation • 22 Nov 2018 • Muzammal Naseer, Salman H. Khan, Shafin Rahman, Fatih Porikli
Deep neural networks (DNNs) can be easily fooled by adding human imperceptible perturbations to the images.
5 code implementations • 3 Jul 2018 • Muzammal Naseer, Salman H. Khan, Fatih Porikli
Deep neural networks (DNNs) have shown vulnerability to adversarial attacks, i. e., carefully perturbed inputs designed to mislead the network at inference time.
no code implementations • 9 Mar 2018 • Muzammal Naseer, Salman H. Khan, Fatih Porikli
With the availability of low-cost and compact 2. 5/3D visual sensing devices, computer vision community is experiencing a growing interest in visual scene understanding of indoor environments.