Search Results for author: Muzammal Naseer

Found 43 papers, 34 papers with code

Cross-Modal Self-Training: Aligning Images and Pointclouds to Learn Classification without Labels

1 code implementation • 15 Apr 2024 • Amaya Dharmasiri, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan

Thereby we demonstrate that 2D vision language models such as CLIP can be used to complement 3D representation learning to improve classification performance without the need for expensive class annotations.

Representation Learning

Paper
Code

Language Guided Domain Generalized Medical Image Segmentation

1 code implementation • 1 Apr 2024 • Shahina Kunhimon, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan

Incorporating text features alongside visual features is a potential solution to enhance the model's understanding of the data, as it goes beyond pixel-level information to provide valuable context.

Contrastive Learning Domain Generalization +4

Paper
Code

Composed Video Retrieval via Enriched Context and Discriminative Embeddings

1 code implementation • 25 Mar 2024 • Omkar Thawakar, Muzammal Naseer, Rao Muhammad Anwer, Salman Khan, Michael Felsberg, Mubarak Shah, Fahad Shahbaz Khan

Composed video retrieval (CoVR) is a challenging problem in computer vision which has recently highlighted the integration of modification text with visual queries for more sophisticated video search in large databases.

Composed Video Retrieval (CoVR) Retrieval

Paper
Code

VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding

no code implementations • 21 Mar 2024 • Ahmad Mahmood, Ashmal Vayani, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan

In contrast, this paper introduces a Video Understanding and Reasoning Framework (VURF) based on the reasoning power of LLMs.

Pose Estimation Video Understanding +1

Paper
Add Code

Hierarchical Text-to-Vision Self Supervised Alignment for Improved Histopathology Representation Learning

1 code implementation • 21 Mar 2024 • Hasindri Watawana, Kanchana Ranasinghe, Tariq Mahmood, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan

Self-supervised representation learning has been highly promising for histopathology image analysis with numerous approaches leveraging their patient-slide-patch hierarchy to learn better representations.

Representation Learning Self-Supervised Learning

Paper
Code

Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery

1 code implementation • 8 Mar 2024 • Mubashir Noman, Muzammal Naseer, Hisham Cholakkal, Rao Muhammad Anwar, Salman Khan, Fahad Shahbaz Khan

Recent advances in unsupervised learning have demonstrated the ability of large vision models to achieve promising results on downstream tasks by pre-training on large amount of unlabelled data.

Multi-Label Classification

Paper
Code

ObjectCompose: Evaluating Resilience of Vision-Based Models on Object-to-Background Compositional Changes

1 code implementation • 7 Mar 2024 • Hashmat Shadab Malik, Muhammad Huzaifa, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan

We produce various versions of standard vision datasets (ImageNet, COCO), incorporating either diverse and realistic backgrounds into the images or introducing color, texture, and adversarial changes in the background.

Object

Paper
Code

MedContext: Learning Contextual Cues for Efficient Volumetric Medical Segmentation

1 code implementation • 27 Feb 2024 • Hanan Gani, Muzammal Naseer, Fahad Khan, Salman Khan

The proposed approach induces contextual knowledge in the network by learning to reconstruct the missing organ or parts of an organ in the output segmentation space.

Segmentation Transfer Learning

Paper
Code

Learning to Prompt with Text Only Supervision for Vision-Language Models

1 code implementation • 4 Jan 2024 • Muhammad Uzair Khattak, Muhammad Ferjad Naeem, Muzammal Naseer, Luc van Gool, Federico Tombari

While effective, most of these works require labeled data which is not practical, and often struggle to generalize towards new datasets due to over-fitting on the source data.

Prompt Engineering

Paper
Code

Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding

no code implementations • 31 Dec 2023 • Syed Talal Wasim, Muzammal Naseer, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan

Our contributions include a novel spatio-temporal video grounding model, surpassing state-of-the-art results in closed-set evaluations on multiple datasets and demonstrating superior performance in open-vocabulary scenarios.

Spatio-Temporal Video Grounding Video Grounding +1

Paper
Add Code

GeoChat: Grounded Large Vision-Language Model for Remote Sensing

1 code implementation • 24 Nov 2023 • Kartik Kuckreja, Muhammad Sohail Danish, Muzammal Naseer, Abhijit Das, Salman Khan, Fahad Shahbaz Khan

Furthermore, the lack of domain-specific multimodal instruction following data as well as strong backbone models for RS make it hard for the models to align their behavior with user queries.

Instruction Following Language Modelling +3

273

Paper
Code

Enhancing Novel Object Detection via Cooperative Foundational Models

1 code implementation • 19 Nov 2023 • Rohit Bharadwaj, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan

We present a novel approach to transform existing closed-set detectors into open-set detectors.

Ranked #1 on Novel Object Detection on LVIS v1.0 val

Novel Class Discovery Novel Object Detection +3

Paper
Code

Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization

no code implementations • NeurIPS 2023 • Jameel Hassan, Hanan Gani, Noor Hussein, Muhammad Uzair Khattak, Muzammal Naseer, Fahad Shahbaz Khan, Salman Khan

The promising zero-shot generalization of vision-language models such as CLIP has led to their adoption using prompt learning for numerous downstream tasks.

Domain Generalization Zero-shot Generalization

Paper
Add Code

Videoprompter: an ensemble of foundational models for zero-shot video understanding

no code implementations • 23 Oct 2023 • Adeel Yousaf, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah

Consistent improvements across multiple benchmarks and with various VLMs demonstrate the effectiveness of our proposed framework.

Ranked #2 on Video-Text Retrieval on Test-of-Time

Action Recognition Descriptive +3

Paper
Add Code

LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts

1 code implementation • 16 Oct 2023 • Hanan Gani, Shariq Farooq Bhat, Muzammal Naseer, Salman Khan, Peter Wonka

Diffusion-based generative models have significantly advanced text-to-image generation but encounter challenges when processing lengthy and intricate text prompts describing complex scenes with multiple objects.

Layout-to-Image Generation Object +2

Paper
Code

FLIP: Cross-domain Face Anti-spoofing with Language Guidance

2 code implementations • ICCV 2023 • Koushik Srivatsan, Muzammal Naseer, Karthik Nandakumar

Specifically, we show that aligning the image representation with an ensemble of class descriptions (based on natural language semantics) improves FAS generalizability in low-data regimes.

Contrastive Learning Face Anti-Spoofing +1

204

Paper
Code

Towards Realistic Zero-Shot Classification via Self Structural Semantic Alignment

1 code implementation • 24 Aug 2023 • Sheng Zhang, Muzammal Naseer, Guangyi Chen, Zhiqiang Shen, Salman Khan, Kun Zhang, Fahad Khan

To address this challenge, we propose the Self Structural Semantic Alignment (S^3A) framework, which extracts the structural semantic information from unlabeled data while simultaneously self-learning.

Self-Learning Zero-Shot Learning

Paper
Code

Foundational Models Defining a New Era in Vision: A Survey and Outlook

1 code implementation • 25 Jul 2023 • Muhammad Awais, Muzammal Naseer, Salman Khan, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, Fahad Shahbaz Khan

Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world.

Benchmarking

413

Paper
Code

Frequency Domain Adversarial Training for Robust Volumetric Medical Segmentation

1 code implementation • 14 Jul 2023 • Asif Hanif, Muzammal Naseer, Salman Khan, Mubarak Shah, Fahad Shahbaz Khan

While recent advances in deep learning have improved the performance of volumetric medical image segmentation models, these models cannot be deployed for real-world applications immediately due to their vulnerability to adversarial attacks.

Adversarial Attack Image Segmentation +3

Paper
Code

Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition

2 code implementations • ICCV 2023 • Syed Talal Wasim, Muhammad Uzair Khattak, Muzammal Naseer, Salman Khan, Mubarak Shah, Fahad Shahbaz Khan

Video transformer designs are based on self-attention that can model global context at a high computational cost.

Ranked #1 on Action Recognition on Diving-48

Action Recognition Temporal Action Localization +1

Paper
Code

Self-regulating Prompts: Foundational Model Adaptation without Forgetting

2 code implementations • ICCV 2023 • Muhammad Uzair Khattak, Syed Talal Wasim, Muzammal Naseer, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan

To the best of our knowledge, this is the first regularization framework for prompt learning that avoids overfitting by jointly attending to pre-trained model features, the training trajectory during prompting, and the textual diversity.

Ranked #2 on Prompt Engineering on ImageNet V2

Prompt Engineering

185

Paper
Code

CLIP2Protect: Protecting Facial Privacy using Text-Guided Makeup via Adversarial Latent Search

1 code implementation • CVPR 2023 • Fahad Shamshad, Muzammal Naseer, Karthik Nandakumar

We propose a novel two-step approach for facial privacy protection that relies on finding adversarial latent codes in the low-dimensional manifold of a pretrained generative model.

Face Recognition Face Verification

Paper
Code

Learnable Weight Initialization for Volumetric Medical Image Segmentation

1 code implementation • 15 Jun 2023 • Shahina Kunhimon, Abdelrahman Shaker, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan

Hybrid volumetric medical image segmentation models, combining the advantages of local convolution and global attention, have recently received considerable attention.

Image Segmentation Organ Segmentation +3

Paper
Code

Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting

1 code implementation • CVPR 2023 • Syed Talal Wasim, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah

Through this prompting scheme, we can achieve state-of-the-art zero-shot performance on Kinetics-600, HMDB51 and UCF101 while remaining competitive in the supervised setting.

Action Recognition Video Classification +2

Paper
Code

Boosting Adversarial Transferability using Dynamic Cues

no code implementations • 23 Feb 2023 • Muzammal Naseer, Ahmad Mahmood, Salman Khan, Fahad Khan

Our temporal prompts are the result of a learnable transformation that allows optimizing for temporal gradients during an adversarial attack to fool the motion dynamics.

Adversarial Attack

Paper
Add Code

Guidance Through Surrogate: Towards a Generic Diagnostic Attack

no code implementations • 30 Dec 2022 • Muzammal Naseer, Salman Khan, Fatih Porikli, Fahad Shahbaz Khan

Recently, different adversarial training defenses are proposed that not only maintain a high clean accuracy but also show significant robustness against popular and well studied adversarial attacks such as PGD.

Adversarial Robustness

Paper
Add Code

PromptCAL: Contrastive Affinity Learning via Auxiliary Prompts for Generalized Novel Category Discovery

1 code implementation • CVPR 2023 • Sheng Zhang, Salman Khan, Zhiqiang Shen, Muzammal Naseer, Guangyi Chen, Fahad Khan

The GNCD setting aims to categorize unlabeled training data coming from known and novel classes by leveraging the information of partially labeled known classes.

Graph Generation

Paper
Code

How to Train Vision Transformer on Small-scale Datasets?

2 code implementations • 13 Oct 2022 • Hanan Gani, Muzammal Naseer, Mohammad Yaqub

However, in contrast to convolutional neural networks, Vision Transformer lacks inherent inductive biases.

128

Paper
Code

Self-Distilled Vision Transformer for Domain Generalization

2 code implementations • 25 Jul 2022 • Maryam Sultana, Muzammal Naseer, Muhammad Haris Khan, Salman Khan, Fahad Shahbaz Khan

Similar to CNNs, ViTs also struggle in out-of-distribution scenarios and the main culprit is overfitting to source domains.

Domain Generalization

Paper
Code

Adversarial Pixel Restoration as a Pretext Task for Transferable Perturbations

1 code implementation • 18 Jul 2022 • Hashmat Shadab Malik, Shahina K Kunhimon, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan

Our training approach is based on a min-max scheme which reduces overfitting via an adversarial objective and thus optimizes for a more generalizable surrogate model.

object-detection Object Detection +2

Paper
Code

Self-supervised Video Transformer

1 code implementation • CVPR 2022 • Kanchana Ranasinghe, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan, Michael Ryoo

To the best of our knowledge, the proposed approach is the first to alleviate the dependency on negative samples or dedicated memory banks in Self-supervised Video Transformer (SVT).

Ranked #55 on Action Recognition on UCF101

Action Classification Action Recognition In Videos

Paper
Code

On Improving Adversarial Transferability of Vision Transformers

3 code implementations • ICLR 2022 • Muzammal Naseer, Kanchana Ranasinghe, Salman Khan, Fahad Shahbaz Khan, Fatih Porikli

(ii) Token Refinement: We then propose to refine the tokens to further enhance the discriminative capacity at each block of ViT.

Adversarial Attack

136

Paper
Code

Intriguing Properties of Vision Transformers

1 code implementation • NeurIPS 2021 • Muzammal Naseer, Kanchana Ranasinghe, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang

We show and analyze the following intriguing properties of ViT: (a) Transformers are highly robust to severe occlusions, perturbations and domain shifts, e. g., retain as high as 60% top-1 accuracy on ImageNet even after randomly occluding 80% of the image content.

Few-Shot Learning Semantic Segmentation

173

Paper
Code

Rich Semantics Improve Few-shot Learning

no code implementations • 26 Apr 2021 • Mohamed Afham, Salman Khan, Muhammad Haris Khan, Muzammal Naseer, Fahad Shahbaz Khan

Human learning benefits from multi-modal inputs that often appear as rich semantics (e. g., description of an object's attributes while learning about it).

Ranked #1 on Few-Shot Image Classification on Oxford 102 Flower (using extra training data)

Few-Shot Image Classification Few-Shot Learning

Paper
Add Code

On Generating Transferable Targeted Perturbations

3 code implementations • ICCV 2021 • Muzammal Naseer, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Fatih Porikli

To this end, we propose a new objective function that not only aligns the global distributions of source and target images, but also matches the local neighbourhood structure between the two domains.

Paper
Code

Orthogonal Projection Loss

1 code implementation • ICCV 2021 • Kanchana Ranasinghe, Muzammal Naseer, Munawar Hayat, Salman Khan, Fahad Shahbaz Khan

The CE loss encourages features of a class to have a higher projection score on the true class-vector compared to the negative classes.

Domain Generalization Few-Shot Learning

110

Paper
Code

Transformers in Vision: A Survey

no code implementations • 4 Jan 2021 • Salman Khan, Muzammal Naseer, Munawar Hayat, Syed Waqas Zamir, Fahad Shahbaz Khan, Mubarak Shah

Astounding results from Transformer models on natural language tasks have intrigued the vision community to study their application to computer vision problems.

Action Recognition Colorization +10

Paper
Add Code

Stylized Adversarial Defense

1 code implementation • 29 Jul 2020 • Muzammal Naseer, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Fatih Porikli

In contrast to existing adversarial training methods that only use class-boundary information (e. g., using a cross-entropy loss), we propose to exploit additional information from the feature space to craft stronger adversaries that are in turn used to learn a robust model.

Adversarial Defense

Paper
Code

A Self-supervised Approach for Adversarial Robustness

2 code implementations • CVPR 2020 • Muzammal Naseer, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Fatih Porikli

Adversarial examples can cause catastrophic mistakes in Deep Neural Network (DNNs) based vision systems e. g., for classification, segmentation and object detection.

Adversarial Robustness General Classification +3

Paper
Code

Cross-Domain Transferability of Adversarial Perturbations

2 code implementations • NeurIPS 2019 • Muzammal Naseer, Salman H. Khan, Harris Khan, Fahad Shahbaz Khan, Fatih Porikli

To this end, we propose a framework capable of launching highly transferable attacks that crafts adversarial patterns to mislead networks trained on wholly different domains.

Paper
Code

Task-generalizable Adversarial Attack based on Perceptual Metric

1 code implementation • 22 Nov 2018 • Muzammal Naseer, Salman H. Khan, Shafin Rahman, Fatih Porikli

Deep neural networks (DNNs) can be easily fooled by adding human imperceptible perturbations to the images.

Adversarial Attack object-detection +2

Paper
Code

Local Gradients Smoothing: Defense against localized adversarial attacks

5 code implementations • 3 Jul 2018 • Muzammal Naseer, Salman H. Khan, Fatih Porikli

Deep neural networks (DNNs) have shown vulnerability to adversarial attacks, i. e., carefully perturbed inputs designed to mislead the network at inference time.

Adversarial Attack

Paper
Code

Indoor Scene Understanding in 2.5/3D for Autonomous Agents: A Survey

no code implementations • 9 Mar 2018 • Muzammal Naseer, Salman H. Khan, Fatih Porikli

With the availability of low-cost and compact 2. 5/3D visual sensing devices, computer vision community is experiencing a growing interest in visual scene understanding of indoor environments.

3D Reconstruction object-detection +6

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.