Search Results for author: Muzammal Naseer

Found 43 papers, 34 papers with code

Cross-Modal Self-Training: Aligning Images and Pointclouds to Learn Classification without Labels

1 code implementation15 Apr 2024 Amaya Dharmasiri, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan

Thereby we demonstrate that 2D vision language models such as CLIP can be used to complement 3D representation learning to improve classification performance without the need for expensive class annotations.

Representation Learning

Language Guided Domain Generalized Medical Image Segmentation

1 code implementation1 Apr 2024 Shahina Kunhimon, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan

Incorporating text features alongside visual features is a potential solution to enhance the model's understanding of the data, as it goes beyond pixel-level information to provide valuable context.

Contrastive Learning Domain Generalization +4

Composed Video Retrieval via Enriched Context and Discriminative Embeddings

1 code implementation25 Mar 2024 Omkar Thawakar, Muzammal Naseer, Rao Muhammad Anwer, Salman Khan, Michael Felsberg, Mubarak Shah, Fahad Shahbaz Khan

Composed video retrieval (CoVR) is a challenging problem in computer vision which has recently highlighted the integration of modification text with visual queries for more sophisticated video search in large databases.

Composed Video Retrieval (CoVR) Retrieval

Hierarchical Text-to-Vision Self Supervised Alignment for Improved Histopathology Representation Learning

1 code implementation21 Mar 2024 Hasindri Watawana, Kanchana Ranasinghe, Tariq Mahmood, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan

Self-supervised representation learning has been highly promising for histopathology image analysis with numerous approaches leveraging their patient-slide-patch hierarchy to learn better representations.

Representation Learning Self-Supervised Learning

Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery

1 code implementation8 Mar 2024 Mubashir Noman, Muzammal Naseer, Hisham Cholakkal, Rao Muhammad Anwar, Salman Khan, Fahad Shahbaz Khan

Recent advances in unsupervised learning have demonstrated the ability of large vision models to achieve promising results on downstream tasks by pre-training on large amount of unlabelled data.

Multi-Label Classification

ObjectCompose: Evaluating Resilience of Vision-Based Models on Object-to-Background Compositional Changes

1 code implementation7 Mar 2024 Hashmat Shadab Malik, Muhammad Huzaifa, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan

We produce various versions of standard vision datasets (ImageNet, COCO), incorporating either diverse and realistic backgrounds into the images or introducing color, texture, and adversarial changes in the background.

Object

MedContext: Learning Contextual Cues for Efficient Volumetric Medical Segmentation

1 code implementation27 Feb 2024 Hanan Gani, Muzammal Naseer, Fahad Khan, Salman Khan

The proposed approach induces contextual knowledge in the network by learning to reconstruct the missing organ or parts of an organ in the output segmentation space.

Segmentation Transfer Learning

Learning to Prompt with Text Only Supervision for Vision-Language Models

1 code implementation4 Jan 2024 Muhammad Uzair Khattak, Muhammad Ferjad Naeem, Muzammal Naseer, Luc van Gool, Federico Tombari

While effective, most of these works require labeled data which is not practical, and often struggle to generalize towards new datasets due to over-fitting on the source data.

Prompt Engineering

Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding

no code implementations31 Dec 2023 Syed Talal Wasim, Muzammal Naseer, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan

Our contributions include a novel spatio-temporal video grounding model, surpassing state-of-the-art results in closed-set evaluations on multiple datasets and demonstrating superior performance in open-vocabulary scenarios.

Spatio-Temporal Video Grounding Video Grounding +1

GeoChat: Grounded Large Vision-Language Model for Remote Sensing

1 code implementation24 Nov 2023 Kartik Kuckreja, Muhammad Sohail Danish, Muzammal Naseer, Abhijit Das, Salman Khan, Fahad Shahbaz Khan

Furthermore, the lack of domain-specific multimodal instruction following data as well as strong backbone models for RS make it hard for the models to align their behavior with user queries.

Instruction Following Language Modelling +3

LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts

1 code implementation16 Oct 2023 Hanan Gani, Shariq Farooq Bhat, Muzammal Naseer, Salman Khan, Peter Wonka

Diffusion-based generative models have significantly advanced text-to-image generation but encounter challenges when processing lengthy and intricate text prompts describing complex scenes with multiple objects.

Layout-to-Image Generation Object +2

FLIP: Cross-domain Face Anti-spoofing with Language Guidance

2 code implementations ICCV 2023 Koushik Srivatsan, Muzammal Naseer, Karthik Nandakumar

Specifically, we show that aligning the image representation with an ensemble of class descriptions (based on natural language semantics) improves FAS generalizability in low-data regimes.

Contrastive Learning Face Anti-Spoofing +1

Towards Realistic Zero-Shot Classification via Self Structural Semantic Alignment

1 code implementation24 Aug 2023 Sheng Zhang, Muzammal Naseer, Guangyi Chen, Zhiqiang Shen, Salman Khan, Kun Zhang, Fahad Khan

To address this challenge, we propose the Self Structural Semantic Alignment (S^3A) framework, which extracts the structural semantic information from unlabeled data while simultaneously self-learning.

Self-Learning Zero-Shot Learning

Foundational Models Defining a New Era in Vision: A Survey and Outlook

1 code implementation25 Jul 2023 Muhammad Awais, Muzammal Naseer, Salman Khan, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, Fahad Shahbaz Khan

Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world.

Benchmarking

Frequency Domain Adversarial Training for Robust Volumetric Medical Segmentation

1 code implementation14 Jul 2023 Asif Hanif, Muzammal Naseer, Salman Khan, Mubarak Shah, Fahad Shahbaz Khan

While recent advances in deep learning have improved the performance of volumetric medical image segmentation models, these models cannot be deployed for real-world applications immediately due to their vulnerability to adversarial attacks.

Adversarial Attack Image Segmentation +3

Self-regulating Prompts: Foundational Model Adaptation without Forgetting

2 code implementations ICCV 2023 Muhammad Uzair Khattak, Syed Talal Wasim, Muzammal Naseer, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan

To the best of our knowledge, this is the first regularization framework for prompt learning that avoids overfitting by jointly attending to pre-trained model features, the training trajectory during prompting, and the textual diversity.

Prompt Engineering

CLIP2Protect: Protecting Facial Privacy using Text-Guided Makeup via Adversarial Latent Search

1 code implementation CVPR 2023 Fahad Shamshad, Muzammal Naseer, Karthik Nandakumar

We propose a novel two-step approach for facial privacy protection that relies on finding adversarial latent codes in the low-dimensional manifold of a pretrained generative model.

Face Recognition Face Verification

Learnable Weight Initialization for Volumetric Medical Image Segmentation

1 code implementation15 Jun 2023 Shahina Kunhimon, Abdelrahman Shaker, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan

Hybrid volumetric medical image segmentation models, combining the advantages of local convolution and global attention, have recently received considerable attention.

Image Segmentation Organ Segmentation +3

Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting

1 code implementation CVPR 2023 Syed Talal Wasim, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah

Through this prompting scheme, we can achieve state-of-the-art zero-shot performance on Kinetics-600, HMDB51 and UCF101 while remaining competitive in the supervised setting.

Action Recognition Video Classification +2

Boosting Adversarial Transferability using Dynamic Cues

no code implementations23 Feb 2023 Muzammal Naseer, Ahmad Mahmood, Salman Khan, Fahad Khan

Our temporal prompts are the result of a learnable transformation that allows optimizing for temporal gradients during an adversarial attack to fool the motion dynamics.

Adversarial Attack

Guidance Through Surrogate: Towards a Generic Diagnostic Attack

no code implementations30 Dec 2022 Muzammal Naseer, Salman Khan, Fatih Porikli, Fahad Shahbaz Khan

Recently, different adversarial training defenses are proposed that not only maintain a high clean accuracy but also show significant robustness against popular and well studied adversarial attacks such as PGD.

Adversarial Robustness

PromptCAL: Contrastive Affinity Learning via Auxiliary Prompts for Generalized Novel Category Discovery

1 code implementation CVPR 2023 Sheng Zhang, Salman Khan, Zhiqiang Shen, Muzammal Naseer, Guangyi Chen, Fahad Khan

The GNCD setting aims to categorize unlabeled training data coming from known and novel classes by leveraging the information of partially labeled known classes.

Graph Generation

How to Train Vision Transformer on Small-scale Datasets?

2 code implementations13 Oct 2022 Hanan Gani, Muzammal Naseer, Mohammad Yaqub

However, in contrast to convolutional neural networks, Vision Transformer lacks inherent inductive biases.

Self-Distilled Vision Transformer for Domain Generalization

2 code implementations25 Jul 2022 Maryam Sultana, Muzammal Naseer, Muhammad Haris Khan, Salman Khan, Fahad Shahbaz Khan

Similar to CNNs, ViTs also struggle in out-of-distribution scenarios and the main culprit is overfitting to source domains.

Domain Generalization

Adversarial Pixel Restoration as a Pretext Task for Transferable Perturbations

1 code implementation18 Jul 2022 Hashmat Shadab Malik, Shahina K Kunhimon, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan

Our training approach is based on a min-max scheme which reduces overfitting via an adversarial objective and thus optimizes for a more generalizable surrogate model.

object-detection Object Detection +2

Self-supervised Video Transformer

1 code implementation CVPR 2022 Kanchana Ranasinghe, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan, Michael Ryoo

To the best of our knowledge, the proposed approach is the first to alleviate the dependency on negative samples or dedicated memory banks in Self-supervised Video Transformer (SVT).

Action Classification Action Recognition In Videos

On Improving Adversarial Transferability of Vision Transformers

3 code implementations ICLR 2022 Muzammal Naseer, Kanchana Ranasinghe, Salman Khan, Fahad Shahbaz Khan, Fatih Porikli

(ii) Token Refinement: We then propose to refine the tokens to further enhance the discriminative capacity at each block of ViT.

Adversarial Attack

Intriguing Properties of Vision Transformers

1 code implementation NeurIPS 2021 Muzammal Naseer, Kanchana Ranasinghe, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang

We show and analyze the following intriguing properties of ViT: (a) Transformers are highly robust to severe occlusions, perturbations and domain shifts, e. g., retain as high as 60% top-1 accuracy on ImageNet even after randomly occluding 80% of the image content.

Few-Shot Learning Semantic Segmentation

Rich Semantics Improve Few-shot Learning

no code implementations26 Apr 2021 Mohamed Afham, Salman Khan, Muhammad Haris Khan, Muzammal Naseer, Fahad Shahbaz Khan

Human learning benefits from multi-modal inputs that often appear as rich semantics (e. g., description of an object's attributes while learning about it).

 Ranked #1 on Few-Shot Image Classification on Oxford 102 Flower (using extra training data)

Few-Shot Image Classification Few-Shot Learning

On Generating Transferable Targeted Perturbations

3 code implementations ICCV 2021 Muzammal Naseer, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Fatih Porikli

To this end, we propose a new objective function that not only aligns the global distributions of source and target images, but also matches the local neighbourhood structure between the two domains.

Orthogonal Projection Loss

1 code implementation ICCV 2021 Kanchana Ranasinghe, Muzammal Naseer, Munawar Hayat, Salman Khan, Fahad Shahbaz Khan

The CE loss encourages features of a class to have a higher projection score on the true class-vector compared to the negative classes.

Domain Generalization Few-Shot Learning

Transformers in Vision: A Survey

no code implementations4 Jan 2021 Salman Khan, Muzammal Naseer, Munawar Hayat, Syed Waqas Zamir, Fahad Shahbaz Khan, Mubarak Shah

Astounding results from Transformer models on natural language tasks have intrigued the vision community to study their application to computer vision problems.

Action Recognition Colorization +10

Stylized Adversarial Defense

1 code implementation29 Jul 2020 Muzammal Naseer, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Fatih Porikli

In contrast to existing adversarial training methods that only use class-boundary information (e. g., using a cross-entropy loss), we propose to exploit additional information from the feature space to craft stronger adversaries that are in turn used to learn a robust model.

Adversarial Defense

A Self-supervised Approach for Adversarial Robustness

2 code implementations CVPR 2020 Muzammal Naseer, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Fatih Porikli

Adversarial examples can cause catastrophic mistakes in Deep Neural Network (DNNs) based vision systems e. g., for classification, segmentation and object detection.

Adversarial Robustness General Classification +3

Cross-Domain Transferability of Adversarial Perturbations

2 code implementations NeurIPS 2019 Muzammal Naseer, Salman H. Khan, Harris Khan, Fahad Shahbaz Khan, Fatih Porikli

To this end, we propose a framework capable of launching highly transferable attacks that crafts adversarial patterns to mislead networks trained on wholly different domains.

Task-generalizable Adversarial Attack based on Perceptual Metric

1 code implementation22 Nov 2018 Muzammal Naseer, Salman H. Khan, Shafin Rahman, Fatih Porikli

Deep neural networks (DNNs) can be easily fooled by adding human imperceptible perturbations to the images.

Adversarial Attack object-detection +2

Local Gradients Smoothing: Defense against localized adversarial attacks

5 code implementations3 Jul 2018 Muzammal Naseer, Salman H. Khan, Fatih Porikli

Deep neural networks (DNNs) have shown vulnerability to adversarial attacks, i. e., carefully perturbed inputs designed to mislead the network at inference time.

Adversarial Attack

Indoor Scene Understanding in 2.5/3D for Autonomous Agents: A Survey

no code implementations9 Mar 2018 Muzammal Naseer, Salman H. Khan, Fatih Porikli

With the availability of low-cost and compact 2. 5/3D visual sensing devices, computer vision community is experiencing a growing interest in visual scene understanding of indoor environments.

3D Reconstruction object-detection +6

Cannot find the paper you are looking for? You can Submit a new open access paper.