Search Results for author: Muzammal Naseer

Found 63 papers, 50 papers with code

Robust-LLaVA: On the Effectiveness of Large-Scale Robust Image Encoders for Multi-modal Large Language Models

1 code implementation3 Feb 2025 Hashmat Shadab Malik, Fahad Shamshad, Muzammal Naseer, Karthik Nandakumar, Fahad Khan, Salman Khan

Multi-modal Large Language Models (MLLMs) excel in vision-language tasks but remain vulnerable to visual adversarial perturbations that can induce hallucinations, manipulate responses, or bypass safety mechanisms.

Adversarial Robustness Image Captioning +2

Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models

1 code implementation24 Dec 2024 Jinhui Yi, Syed Talal Wasim, Yanan Luo, Muzammal Naseer, Juergen Gall

The fine-grained video question-answering evaluation demonstrates our model's effectiveness, outperforming the encoder-based approaches Video-ChatGPT and Video-LLaVA in key aspects like correctness and temporal understanding.

Question Answering Video Question Answering

UniMed-CLIP: Towards a Unified Image-Text Pretraining Paradigm for Diverse Medical Imaging Modalities

1 code implementation13 Dec 2024 Muhammad Uzair Khattak, Shahina Kunhimon, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan

UniMed is developed using a data-collection framework that leverages Large Language Models (LLMs) to transform modality-specific classification datasets into image-text formats while incorporating existing image-text data from the medical domain, facilitating scalable VLM pretraining.

Contrastive Learning

AgriCLIP: Adapting CLIP for Agriculture and Livestock via Domain-Specialized Cross-Model Alignment

1 code implementation2 Oct 2024 Umair Nawaz, Muhammad Awais, Hanan Gani, Muzammal Naseer, Fahad Khan, Salman Khan, Rao Muhammad Anwer

Further, this domain desires fine-grained feature learning due to the subtle nature of the downstream tasks (e. g, nutrient deficiency detection, livestock breed classification).

Self-Supervised Learning Zero-Shot Learning

Y-CA-Net: A Convolutional Attention Based Network for Volumetric Medical Image Segmentation

no code implementations1 Oct 2024 Muhammad Hamza Sharif, Muzammal Naseer, Mohammad Yaqub, Min Xu, Mohsen Guizani

However, for voxel-wise prediction tasks, discriminative local features are key components for the performance of the VS models which is missing in attention-based VS methods.

Image Segmentation Organ Segmentation +3

CDChat: A Large Multimodal Model for Remote Sensing Change Description

1 code implementation24 Sep 2024 Mubashir Noman, Noor Ahsan, Muzammal Naseer, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Fahad Shahbaz Khan

In order to achieve this, we introduce a change description instruction dataset that can be utilized to finetune an LMM and provide better change descriptions for RS images.

Distillation-free Scaling of Large SSMs for Images and Videos

no code implementations18 Sep 2024 Hamid Suleman, Syed Talal Wasim, Muzammal Naseer, Juergen Gall

We demonstrate that the stable and efficient interleaved architecture resolves the scalability issue of Mamba-based architectures for images and videos and increases robustness to common artifacts like JPEG compression.

Action Recognition Image Classification +3

PromptSmooth: Certifying Robustness of Medical Vision-Language Models via Prompt Learning

1 code implementation29 Aug 2024 Noor Hussein, Fahad Shamshad, Muzammal Naseer, Karthik Nandakumar

Moreover, PromptSmooth requires only a single model to handle multiple noise levels, which substantially reduces the computational cost compared to traditional methods that rely on training a separate model for each noise level.

Medical Image Analysis

STEREO: Towards Adversarially Robust Concept Erasing from Text-to-Image Generation Models

1 code implementation29 Aug 2024 Koushik Srivatsan, Fahad Shamshad, Muzammal Naseer, Karthik Nandakumar

Though many methods have been proposed for erasing undesired concepts from T2IG models, they only provide a false sense of security, as recent works demonstrate that concept-erased models (CEMs) can be easily deceived to generate the erased concept through adversarial attacks.

Benchmarking Text-to-Image Generation

Makeup-Guided Facial Privacy Protection via Untrained Neural Network Priors

1 code implementation20 Aug 2024 Fahad Shamshad, Muzammal Naseer, Karthik Nandakumar

To handle these issues, we propose a test-time optimization approach that solely optimizes an untrained neural network to transfer makeup style from a reference to a source image in an adversarial manner.

Decoder Face Recognition +1

BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning

1 code implementation14 Aug 2024 Asif Hanif, Fahad Shamshad, Muhammad Awais, Muzammal Naseer, Fahad Shahbaz Khan, Karthik Nandakumar, Salman Khan, Rao Muhammad Anwer

Inspired by the latest developments in learnable prompts, this work introduces a method to embed a backdoor into the medical foundation model during the prompt learning phase.

Backdoor Attack

Probing the Efficacy of Federated Parameter-Efficient Fine-Tuning of Vision Transformers for Medical Image Classification

no code implementations16 Jul 2024 Naif Alkhunaizi, Faris Almalik, Rouqaiah Al-Refai, Muzammal Naseer, Karthik Nandakumar

Moreover, the large size of these models necessitates the use of parameter-efficient fine-tuning (PEFT) to reduce the communication burden in federated learning.

Federated Learning Image Classification +3

VANE-Bench: Video Anomaly Evaluation Benchmark for Conversational LMMs

2 code implementations14 Jun 2024 Rohit Bharadwaj, Hanan Gani, Muzammal Naseer, Fahad Shahbaz Khan, Salman Khan

Despite their impressive capabilities, current Video-LMMs have not been evaluated for anomaly detection tasks, which is critical to their deployment in practical scenarios e. g., towards identifying deepfakes, manipulated video content, traffic accidents and crimes.

Anomaly Detection Benchmarking +4

Towards Evaluating the Robustness of Visual State Space Models

1 code implementation13 Jun 2024 Hashmat Shadab Malik, Fahad Shamshad, Muzammal Naseer, Karthik Nandakumar, Fahad Shahbaz Khan, Salman Khan

To gain a deeper understanding of VSSMs' adversarial robustness, we conduct a frequency-based analysis of adversarial attacks, evaluating their performance against low-frequency and high-frequency perturbations.

Adversarial Robustness object-detection +2

On Evaluating Adversarial Robustness of Volumetric Medical Segmentation Models

1 code implementation12 Jun 2024 Hashmat Shadab Malik, Numan Saeed, Asif Hanif, Muzammal Naseer, Mohammad Yaqub, Salman Khan, Fahad Shahbaz Khan

We extend this investigation across four volumetric segmentation datasets, evaluating robustness under both white box and black box adversarial attacks.

Adversarial Robustness Mamba +1

Multi-Granularity Language-Guided Multi-Object Tracking

1 code implementation7 Jun 2024 Yuhao Li, Muzammal Naseer, Jiale Cao, Yu Zhu, Jinqiu Sun, Yanning Zhang, Fahad Shahbaz Khan

To this end, we propose a new multi-object tracking framework, named LG-MOT, that explicitly leverages language information at different levels of granularity (scene-and instance-level) and combines it with standard visual features to obtain discriminative representations.

Multi-Object Tracking Object +1

How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs

no code implementations6 May 2024 Muhammad Uzair Khattak, Muhammad Ferjad Naeem, Jameel Hassan, Muzammal Naseer, Federico Tombari, Fahad Shahbaz Khan, Salman Khan

Recent advancements in Large Language Models (LLMs) have led to the development of Video Large Multi-modal Models (Video-LMMs) that can handle a wide range of video understanding tasks.

Autonomous Vehicles Video Understanding

Cross-Modal Self-Training: Aligning Images and Pointclouds to Learn Classification without Labels

1 code implementation15 Apr 2024 Amaya Dharmasiri, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan

Thereby we demonstrate that 2D vision language models such as CLIP can be used to complement 3D representation learning to improve classification performance without the need for expensive class annotations.

Representation Learning

Language Guided Domain Generalized Medical Image Segmentation

1 code implementation1 Apr 2024 Shahina Kunhimon, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan

Incorporating text features alongside visual features is a potential solution to enhance the model's understanding of the data, as it goes beyond pixel-level information to provide valuable context.

Contrastive Learning Image Segmentation +4

Composed Video Retrieval via Enriched Context and Discriminative Embeddings

1 code implementation CVPR 2024 Omkar Thawakar, Muzammal Naseer, Rao Muhammad Anwer, Salman Khan, Michael Felsberg, Mubarak Shah, Fahad Shahbaz Khan

Composed video retrieval (CoVR) is a challenging problem in computer vision which has recently highlighted the integration of modification text with visual queries for more sophisticated video search in large databases.

Composed Video Retrieval (CoVR) Retrieval

Hierarchical Text-to-Vision Self Supervised Alignment for Improved Histopathology Representation Learning

1 code implementation21 Mar 2024 Hasindri Watawana, Kanchana Ranasinghe, Tariq Mahmood, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan

Self-supervised representation learning has been highly promising for histopathology image analysis with numerous approaches leveraging their patient-slide-patch hierarchy to learn better representations.

Representation Learning Self-Supervised Learning

Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery

1 code implementation CVPR 2024 Mubashir Noman, Muzammal Naseer, Hisham Cholakkal, Rao Muhammad Anwar, Salman Khan, Fahad Shahbaz Khan

Recent advances in unsupervised learning have demonstrated the ability of large vision models to achieve promising results on downstream tasks by pre-training on large amount of unlabelled data.

Multi-Label Classification MUlTI-LABEL-ClASSIFICATION

ObjectCompose: Evaluating Resilience of Vision-Based Models on Object-to-Background Compositional Changes

1 code implementation7 Mar 2024 Hashmat Shadab Malik, Muhammad Huzaifa, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan

We produce various versions of standard vision datasets (ImageNet, COCO), incorporating either diverse and realistic backgrounds into the images or introducing color, texture, and adversarial changes in the background.

Image to text Object

MedContext: Learning Contextual Cues for Efficient Volumetric Medical Segmentation

1 code implementation27 Feb 2024 Hanan Gani, Muzammal Naseer, Fahad Khan, Salman Khan

The proposed approach induces contextual knowledge in the network by learning to reconstruct the missing organ or parts of an organ in the output segmentation space.

Medical Image Analysis Segmentation +1

Learning to Prompt with Text Only Supervision for Vision-Language Models

1 code implementation4 Jan 2024 Muhammad Uzair Khattak, Muhammad Ferjad Naeem, Muzammal Naseer, Luc van Gool, Federico Tombari

While effective, most of these works require labeled data which is not practical, and often struggle to generalize towards new datasets due to over-fitting on the source data.

Prompt Engineering

VideoGrounding-DINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding

no code implementations CVPR 2024 Syed Talal Wasim, Muzammal Naseer, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan

Our contributions include a novel spatio-temporal video grounding model surpassing state-of-the-art results in closed-set evaluations on multiple datasets and demonstrating superior performance in open-vocabulary scenarios.

Spatio-Temporal Video Grounding Video Grounding +1

Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding

no code implementations31 Dec 2023 Syed Talal Wasim, Muzammal Naseer, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan

Our contributions include a novel spatio-temporal video grounding model, surpassing state-of-the-art results in closed-set evaluations on multiple datasets and demonstrating superior performance in open-vocabulary scenarios.

Spatio-Temporal Video Grounding Video Grounding +1

GeoChat: Grounded Large Vision-Language Model for Remote Sensing

1 code implementation CVPR 2024 Kartik Kuckreja, Muhammad Sohail Danish, Muzammal Naseer, Abhijit Das, Salman Khan, Fahad Shahbaz Khan

Furthermore, the lack of domain-specific multimodal instruction following data as well as strong backbone models for RS make it hard for the models to align their behavior with user queries.

Instruction Following Language Modeling +4

LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts

1 code implementation16 Oct 2023 Hanan Gani, Shariq Farooq Bhat, Muzammal Naseer, Salman Khan, Peter Wonka

Diffusion-based generative models have significantly advanced text-to-image generation but encounter challenges when processing lengthy and intricate text prompts describing complex scenes with multiple objects.

Layout-to-Image Generation Object +2

FLIP: Cross-domain Face Anti-spoofing with Language Guidance

3 code implementations ICCV 2023 Koushik Srivatsan, Muzammal Naseer, Karthik Nandakumar

Specifically, we show that aligning the image representation with an ensemble of class descriptions (based on natural language semantics) improves FAS generalizability in low-data regimes.

Contrastive Learning Face Anti-Spoofing +1

Towards Realistic Zero-Shot Classification via Self Structural Semantic Alignment

1 code implementation24 Aug 2023 Sheng Zhang, Muzammal Naseer, Guangyi Chen, Zhiqiang Shen, Salman Khan, Kun Zhang, Fahad Khan

To address this challenge, we propose the Self Structural Semantic Alignment (S^3A) framework, which extracts the structural semantic information from unlabeled data while simultaneously self-learning.

Self-Learning Zero-Shot Learning

Foundational Models Defining a New Era in Vision: A Survey and Outlook

1 code implementation25 Jul 2023 Muhammad Awais, Muzammal Naseer, Salman Khan, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, Fahad Shahbaz Khan

Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world.

Benchmarking

Frequency Domain Adversarial Training for Robust Volumetric Medical Segmentation

2 code implementations14 Jul 2023 Asif Hanif, Muzammal Naseer, Salman Khan, Mubarak Shah, Fahad Shahbaz Khan

While recent advances in deep learning have improved the performance of volumetric medical image segmentation models, these models cannot be deployed for real-world applications immediately due to their vulnerability to adversarial attacks.

Adversarial Attack Deep Learning +4

Self-regulating Prompts: Foundational Model Adaptation without Forgetting

2 code implementations ICCV 2023 Muhammad Uzair Khattak, Syed Talal Wasim, Muzammal Naseer, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan

To the best of our knowledge, this is the first regularization framework for prompt learning that avoids overfitting by jointly attending to pre-trained model features, the training trajectory during prompting, and the textual diversity.

Diversity model +1

CLIP2Protect: Protecting Facial Privacy using Text-Guided Makeup via Adversarial Latent Search

1 code implementation CVPR 2023 Fahad Shamshad, Muzammal Naseer, Karthik Nandakumar

We propose a novel two-step approach for facial privacy protection that relies on finding adversarial latent codes in the low-dimensional manifold of a pretrained generative model.

Face Recognition Face Verification

Learnable Weight Initialization for Volumetric Medical Image Segmentation

1 code implementation15 Jun 2023 Shahina Kunhimon, Abdelrahman Shaker, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan

Hybrid volumetric medical image segmentation models, combining the advantages of local convolution and global attention, have recently received considerable attention.

Image Segmentation Organ Segmentation +3

Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting

1 code implementation CVPR 2023 Syed Talal Wasim, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah

Through this prompting scheme, we can achieve state-of-the-art zero-shot performance on Kinetics-600, HMDB51 and UCF101 while remaining competitive in the supervised setting.

Action Recognition Video Classification +2

Boosting Adversarial Transferability using Dynamic Cues

no code implementations23 Feb 2023 Muzammal Naseer, Ahmad Mahmood, Salman Khan, Fahad Khan

Our temporal prompts are the result of a learnable transformation that allows optimizing for temporal gradients during an adversarial attack to fool the motion dynamics.

Adversarial Attack

Guidance Through Surrogate: Towards a Generic Diagnostic Attack

no code implementations30 Dec 2022 Muzammal Naseer, Salman Khan, Fatih Porikli, Fahad Shahbaz Khan

Recently, different adversarial training defenses are proposed that not only maintain a high clean accuracy but also show significant robustness against popular and well studied adversarial attacks such as PGD.

Adversarial Robustness

PromptCAL: Contrastive Affinity Learning via Auxiliary Prompts for Generalized Novel Category Discovery

1 code implementation CVPR 2023 Sheng Zhang, Salman Khan, Zhiqiang Shen, Muzammal Naseer, Guangyi Chen, Fahad Khan

The GNCD setting aims to categorize unlabeled training data coming from known and novel classes by leveraging the information of partially labeled known classes.

Graph Generation

How to Train Vision Transformer on Small-scale Datasets?

2 code implementations13 Oct 2022 Hanan Gani, Muzammal Naseer, Mohammad Yaqub

However, in contrast to convolutional neural networks, Vision Transformer lacks inherent inductive biases.

Self-Distilled Vision Transformer for Domain Generalization

2 code implementations25 Jul 2022 Maryam Sultana, Muzammal Naseer, Muhammad Haris Khan, Salman Khan, Fahad Shahbaz Khan

Similar to CNNs, ViTs also struggle in out-of-distribution scenarios and the main culprit is overfitting to source domains.

Domain Generalization

Adversarial Pixel Restoration as a Pretext Task for Transferable Perturbations

1 code implementation18 Jul 2022 Hashmat Shadab Malik, Shahina K Kunhimon, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan

Our training approach is based on a min-max scheme which reduces overfitting via an adversarial objective and thus optimizes for a more generalizable surrogate model.

object-detection Object Detection +2

Self-supervised Video Transformer

1 code implementation CVPR 2022 Kanchana Ranasinghe, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan, Michael Ryoo

To the best of our knowledge, the proposed approach is the first to alleviate the dependency on negative samples or dedicated memory banks in Self-supervised Video Transformer (SVT).

Action Classification Action Recognition In Videos +1

On Improving Adversarial Transferability of Vision Transformers

3 code implementations ICLR 2022 Muzammal Naseer, Kanchana Ranasinghe, Salman Khan, Fahad Shahbaz Khan, Fatih Porikli

(ii) Token Refinement: We then propose to refine the tokens to further enhance the discriminative capacity at each block of ViT.

Adversarial Attack

Intriguing Properties of Vision Transformers

1 code implementation NeurIPS 2021 Muzammal Naseer, Kanchana Ranasinghe, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang

We show and analyze the following intriguing properties of ViT: (a) Transformers are highly robust to severe occlusions, perturbations and domain shifts, e. g., retain as high as 60% top-1 accuracy on ImageNet even after randomly occluding 80% of the image content.

Few-Shot Learning Semantic Segmentation

Rich Semantics Improve Few-shot Learning

no code implementations26 Apr 2021 Mohamed Afham, Salman Khan, Muhammad Haris Khan, Muzammal Naseer, Fahad Shahbaz Khan

Human learning benefits from multi-modal inputs that often appear as rich semantics (e. g., description of an object's attributes while learning about it).

 Ranked #1 on Few-Shot Image Classification on Oxford 102 Flower (using extra training data)

Few-Shot Image Classification Few-Shot Learning

On Generating Transferable Targeted Perturbations

3 code implementations ICCV 2021 Muzammal Naseer, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Fatih Porikli

To this end, we propose a new objective function that not only aligns the global distributions of source and target images, but also matches the local neighbourhood structure between the two domains.

Orthogonal Projection Loss

1 code implementation ICCV 2021 Kanchana Ranasinghe, Muzammal Naseer, Munawar Hayat, Salman Khan, Fahad Shahbaz Khan

The CE loss encourages features of a class to have a higher projection score on the true class-vector compared to the negative classes.

Domain Generalization Few-Shot Learning

Transformers in Vision: A Survey

no code implementations4 Jan 2021 Salman Khan, Muzammal Naseer, Munawar Hayat, Syed Waqas Zamir, Fahad Shahbaz Khan, Mubarak Shah

Astounding results from Transformer models on natural language tasks have intrigued the vision community to study their application to computer vision problems.

Action Recognition Colorization +11

Stylized Adversarial Defense

1 code implementation29 Jul 2020 Muzammal Naseer, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Fatih Porikli

In contrast to existing adversarial training methods that only use class-boundary information (e. g., using a cross-entropy loss), we propose to exploit additional information from the feature space to craft stronger adversaries that are in turn used to learn a robust model.

Adversarial Defense

A Self-supervised Approach for Adversarial Robustness

2 code implementations CVPR 2020 Muzammal Naseer, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Fatih Porikli

Adversarial examples can cause catastrophic mistakes in Deep Neural Network (DNNs) based vision systems e. g., for classification, segmentation and object detection.

Adversarial Robustness General Classification +3

Cross-Domain Transferability of Adversarial Perturbations

2 code implementations NeurIPS 2019 Muzammal Naseer, Salman H. Khan, Harris Khan, Fahad Shahbaz Khan, Fatih Porikli

To this end, we propose a framework capable of launching highly transferable attacks that crafts adversarial patterns to mislead networks trained on wholly different domains.

Task-generalizable Adversarial Attack based on Perceptual Metric

1 code implementation22 Nov 2018 Muzammal Naseer, Salman H. Khan, Shafin Rahman, Fatih Porikli

Deep neural networks (DNNs) can be easily fooled by adding human imperceptible perturbations to the images.

Adversarial Attack object-detection +2

Local Gradients Smoothing: Defense against localized adversarial attacks

5 code implementations3 Jul 2018 Muzammal Naseer, Salman H. Khan, Fatih Porikli

Deep neural networks (DNNs) have shown vulnerability to adversarial attacks, i. e., carefully perturbed inputs designed to mislead the network at inference time.

Adversarial Attack

Indoor Scene Understanding in 2.5/3D for Autonomous Agents: A Survey

no code implementations9 Mar 2018 Muzammal Naseer, Salman H. Khan, Fatih Porikli

With the availability of low-cost and compact 2. 5/3D visual sensing devices, computer vision community is experiencing a growing interest in visual scene understanding of indoor environments.

3D Reconstruction object-detection +7

Cannot find the paper you are looking for? You can Submit a new open access paper.