1 code implementation • ECCV 2020 • Guolei Sun, Salman Khan, Wen Li, Hisham Cholakkal, Fahad Shahbaz Khan, Luc van Gool
This way, in an effort to fix localization errors, our loss provides an extra supervisory signal that helps the model to better discriminate between similar classes.
1 code implementation • 20 Sep 2023 • Nian Liu, Kepan Nan, Wangbo Zhao, Yuanwei Liu, Xiwen Yao, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Junwei Han, Fahad Shahbaz Khan
We decompose the query video information into a clip prototype and a memory prototype for capturing local and long-term internal temporal guidance, respectively.
1 code implementation • 19 Sep 2023 • Mamona Awan, Muhammad Haris Khan, Sanoojan Baliah, Muhammad Ahmad Waseem, Salman Khan, Fahad Shahbaz Khan, Arif Mahmood
In the current work, we introduce a consistency-guided bottleneck in an image reconstruction-based pipeline that leverages landmark consistency, a measure of compatibility score with the pseudo-ground truth to generate adaptive heatmaps.
1 code implementation • 24 Aug 2023 • Sheng Zhang, Muzammal Naseer, Guangyi Chen, Zhiqiang Shen, Salman Khan, Kun Zhang, Fahad Khan
To address this challenge, we propose the Self Structural Semantic Alignment (S^3A) framework, which extracts the structural semantic information from unlabeled data while simultaneously self-learning.
1 code implementation • 11 Aug 2023 • Chun-Mei Feng, Kai Yu, Yong liu, Salman Khan, WangMeng Zuo
In this paper, we focus on a particular setting of learning adaptive prompts on the fly for each test sample from an unseen new domain, which is known as test-time prompt tuning (TPT).
1 code implementation • 11 Aug 2023 • Chun-Mei Feng, Kai Yu, Nian Liu, Xinxing Xu, Salman Khan, WangMeng Zuo
However, the performance of the global model is often hampered by non-i. i. d.
1 code implementation • 27 Jul 2023 • Haotong Qin, Ge-Peng Ji, Salman Khan, Deng-Ping Fan, Fahad Shahbaz Khan, Luc van Gool
Google's Bard has emerged as a formidable competitor to OpenAI's ChatGPT in the field of conversational AI.
1 code implementation • 25 Jul 2023 • Muhammad Awais, Muzammal Naseer, Salman Khan, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, Fahad Shahbaz Khan
Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world.
1 code implementation • 14 Jul 2023 • Asif Hanif, Muzammal Naseer, Salman Khan, Mubarak Shah, Fahad Shahbaz Khan
While recent advances in deep learning have improved the performance of volumetric medical image segmentation models, these models cannot be deployed for real-world applications immediately due to their vulnerability to adversarial attacks.
2 code implementations • 13 Jul 2023 • Muhammad Uzair Khattak, Syed Talal Wasim, Muzammal Naseer, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan
To the best of our knowledge, this is the first regularization framework for prompt learning that avoids overfitting by jointly attending to pre-trained model features, the training trajectory during prompting, and the textual diversity.
Ranked #1 on
Prompt Engineering
on ImageNet V2
1 code implementation • 13 Jul 2023 • Syed Talal Wasim, Muhammad Uzair Khattak, Muzammal Naseer, Salman Khan, Mubarak Shah, Fahad Shahbaz Khan
Video transformer designs are based on self-attention that can model global context at a high computational cost.
1 code implementation • 22 Jun 2023 • Vaishnav Potlapalli, Syed Waqas Zamir, Salman Khan, Fahad Shahbaz Khan
We present a prompt-based learning approach, PromptIR, for All-In-One image restoration that can effectively restore images from various types and levels of degradation.
1 code implementation • 15 Jun 2023 • Shahina Kunhimon, Abdelrahman Shaker, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan
Hybrid volumetric medical image segmentation models, combining the advantages of local convolution and global attention, have recently received considerable attention.
1 code implementation • 13 Jun 2023 • Omkar Thawkar, Abdelrahman Shaker, Sahal Shaji Mullappilly, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Jorma Laaksonen, Fahad Shahbaz Khan
The latest breakthroughs in large vision-language models, such as Bard and GPT-4, have showcased extraordinary abilities in performing a wide range of tasks.
1 code implementation • 8 Jun 2023 • Muhammad Maaz, Hanoona Rasheed, Salman Khan, Fahad Shahbaz Khan
Conversation agents fueled by Large Language Models (LLMs) are providing a new way to interact with visual data.
Ranked #1 on
Video-based Generative Performance Benchmarking (Temporal Understanding)
on VideoInstruct
Video-based Generative Performance Benchmarking (Consistency)
Video-based Generative Performance Benchmarking (Contextual Understanding)
+5
1 code implementation • 26 May 2023 • Xi Weng, Yunhao Ni, Tengwei Song, Jie Luo, Rao Muhammad Anwer, Salman Khan, Fahad Shahbaz Khan, Lei Huang
We show that whitening transformation is a special instance of ST by definition, and there exist other instances that can avoid collapse by our empirical investigation.
1 code implementation • CVPR 2023 • Long Li, Junwei Han, Ni Zhang, Nian Liu, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan
Then, we use two types of pre-defined tokens to mine co-saliency and background information via our proposed contrast-induced pixel-to-token correlation and co-saliency token-to-token correlation modules.
1 code implementation • CVPR 2023 • Nancy Mehta, Akshay Dudhane, Subrahmanyam Murala, Syed Waqas Zamir, Salman Khan, Fahad Shahbaz Khan
Burst image processing is becoming increasingly popular in recent years.
1 code implementation • 13 Apr 2023 • Mubashir Noman, Mustansar Fiaz, Hisham Cholakkal, Sanath Narayan, Rao Muhammad Anwer, Salman Khan, Fahad Shahbaz Khan
Current transformer-based change detection (CD) approaches either employ a pre-trained model trained on large-scale image classification ImageNet dataset or rely on first pre-training on another CD dataset and then fine-tuning on the target benchmark.
1 code implementation • CVPR 2023 • Syed Talal Wasim, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah
Through this prompting scheme, we can achieve state-of-the-art zero-shot performance on Kinetics-600, HMDB51 and UCF101 while remaining competitive in the supervised setting.
1 code implementation • CVPR 2023 • Akshay Dudhane, Syed Waqas Zamir, Salman Khan, Fahad Shahbaz Khan, Ming-Hsuan Yang
Unlike existing methods, the proposed alignment module not only aligns burst features but also exchanges feature information and maintains focused communication with the reference frame through the proposed reference-based feature enrichment mechanism, which facilitates handling complex motions.
1 code implementation • 3 Apr 2023 • Amandeep Kumar, Ankan Kumar Bhunia, Sanath Narayan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan
We present a method to efficiently generate 3D-aware high-resolution images that are view-consistent across multiple target views.
1 code implementation • 3 Apr 2023 • Omkar Thawakar, Sanath Narayan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Jorma Laaksonen, Mubarak Shah, Fahad Shahbaz Khan
Open-world formulation relaxes the close-world static-learning assumption as follows: (a) first, it distinguishes a set of known categories as well as labels an unknown object as `unknown' and then (b) it incrementally learns the class of an unknown as and when the corresponding semantic labels become available.
2 code implementations • 27 Mar 2023 • Abdelrahman Shaker, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan
Using our proposed efficient additive attention, we build a series of models called "SwiftFormer" which achieves state-of-the-art performance in terms of both accuracy and mobile inference speed.
1 code implementation • CVPR 2023 • Muhammad Akhtar Munir, Muhammad Haris Khan, Salman Khan, Fahad Shahbaz Khan
Since the original formulation of our loss depends on the counts of true positives and false positives in a minibatch, we develop a differentiable proxy of our loss that can be used during training with other application-specific loss functions.
no code implementations • 23 Feb 2023 • Muzammal Naseer, Ahmad Mahmood, Salman Khan, Fahad Khan
Our temporal prompts are the result of a learnable transformation that allows optimizing for temporal gradients during an adversarial attack to fool the motion dynamics.
no code implementations • 30 Dec 2022 • Muzammal Naseer, Salman Khan, Fatih Porikli, Fahad Shahbaz Khan
Recently, different adversarial training defenses are proposed that not only maintain a high clean accuracy but also show significant robustness against popular and well studied adversarial attacks such as PGD.
1 code implementation • CVPR 2023 • Sheng Zhang, Salman Khan, Zhiqiang Shen, Muzammal Naseer, Guangyi Chen, Fahad Khan
The GNCD setting aims to categorize unlabeled training data coming from known and novel classes by leveraging the information of partially labeled known classes.
2 code implementations • 8 Dec 2022 • Abdelrahman Shaker, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan
Owing to the success of transformer models, recent works study their applicability in 3D medical segmentation tasks.
1 code implementation • CVPR 2023 • Hanoona Rasheed, Muhammad Uzair Khattak, Muhammad Maaz, Salman Khan, Fahad Shahbaz Khan
Since training on a similar scale for videos is infeasible, recent approaches focus on the effective transfer of image-based CLIP to the video domain.
1 code implementation • CVPR 2023 • Ankan Kumar Bhunia, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Jorma Laaksonen, Mubarak Shah, Fahad Shahbaz Khan
In this work, we show how denoising diffusion models can be applied for high-fidelity person image synthesis with strong sample diversity and enhanced mode coverage of the learnt data distribution.
1 code implementation • 7 Oct 2022 • Xi Weng, Lei Huang, Lei Zhao, Rao Muhammad Anwer, Salman Khan, Fahad Shahbaz Khan
A desirable objective in self-supervised learning (SSL) is to avoid feature collapse.
1 code implementation • 6 Oct 2022 • Vishal Thengane, Salman Khan, Munawar Hayat, Fahad Khan
In this work, we show that a frozen CLIP (Contrastive Language-Image Pretraining) model offers astounding continual learning performance without any fine-tuning (zero-shot evaluation).
2 code implementations • CVPR 2023 • Muhammad Uzair Khattak, Hanoona Rasheed, Muhammad Maaz, Salman Khan, Fahad Shahbaz Khan
Pre-trained vision-language (V-L) models such as CLIP have shown excellent generalization ability to downstream tasks.
Ranked #1 on
Prompt Engineering
on FGVC-Aircraft
1 code implementation • 4 Oct 2022 • Eleonora Giunchiglia, Mihaela Cătălina Stoian, Salman Khan, Fabio Cuzzolin, Thomas Lukasiewicz
Neural networks have proven to be very powerful at computer vision tasks.
no code implementations • 12 Sep 2022 • Kaizhe Jin, Adrian Rubio-Solis, Ravi Naik, Tochukwu Onyeogulu, Amirul Islam, Salman Khan, Izzeddin Teeti, James Kinross, Daniel R Leff, Fabio Cuzzolin, George Mylonas
In this paper, a cascade of two machine learning approaches is suggested for the multimodal recognition of CWL in four different surgical task conditions.
no code implementations • 12 Sep 2022 • Tochukwu Onyeogulu, Salman Khan, Izzeddin Teeti, Amirul Islam, Kaizhe Jin, Adrian Rubio-Solis, Ravi Naik, George Mylonas, Fabio Cuzzolin
Nowadays, there are more surgical procedures that are being performed using minimally invasive surgery (MIS).
no code implementations • 2 Sep 2022 • Abdulaziz Amer Aleissaee, Amandeep Kumar, Rao Muhammad Anwer, Salman Khan, Hisham Cholakkal, Gui-Song Xia, Fahad Shahbaz Khan
Deep learning-based algorithms have seen a massive popularity in different areas of remote sensing image analysis over the past decade.
2 code implementations • 14 Aug 2022 • Mubashir Noman, Wafa Al Ghallabi, Daniya Najiha, Christoph Mayer, Akshay Dudhane, Martin Danelljan, Hisham Cholakkal, Salman Khan, Luc van Gool, Fahad Shahbaz Khan
While being greatly benefiting to the tracking research, existing benchmarks do not pose the same difficulty as before with recent trackers achieving higher performance mainly due to (i) the introduction of more sophisticated transformers-based methods and (ii) the lack of diverse scenarios with adverse visibility such as, severe weather conditions, camouflage and imaging effects.
1 code implementation • 8 Aug 2022 • Jean Lahoud, Jiale Cao, Fahad Shahbaz Khan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Ming-Hsuan Yang
The success of the transformer architecture in natural language processing has recently triggered attention in the computer vision field.
1 code implementation • 25 Jul 2022 • Abdelrahman Mohamed, Rushali Grandhe, K J Joseph, Salman Khan, Fahad Khan
In contrast to a recent ViT based CIL approach, our $\textrm{D}^3\textrm{Former}$ does not dynamically expand its architecture when new tasks are learned and remains suitable for a large number of incremental tasks.
2 code implementations • 25 Jul 2022 • Maryam Sultana, Muzammal Naseer, Muhammad Haris Khan, Salman Khan, Fahad Shahbaz Khan
Similar to CNNs, ViTs also struggle in out-of-distribution scenarios and the main culprit is overfitting to source domains.
1 code implementation • 18 Jul 2022 • Hashmat Shadab Malik, Shahina K Kunhimon, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan
Our training approach is based on a min-max scheme which reduces overfitting via an adversarial objective and thus optimizes for a more generalizable surrogate model.
1 code implementation • 7 Jul 2022 • Hanoona Rasheed, Muhammad Maaz, Muhammad Uzair Khattak, Salman Khan, Fahad Shahbaz Khan
Two popular forms of weak-supervision used in open-vocabulary detection (OVD) include pretrained CLIP model and image-level supervision.
Ranked #1 on
Open Vocabulary Object Detection
on OpenImages-v4
Open Vocabulary Attribute Detection
Zero-Shot Object Detection
1 code implementation • 5 Jul 2022 • Mamshad Nayeem Rizve, Navid Kardan, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah
In the open-world SSL problem, the objective is to recognize samples of known classes, and simultaneously detect and cluster samples belonging to novel classes present in unlabeled data.
Ranked #1 on
Open-World Semi-Supervised Learning
on CIFAR-10
7 code implementations • 21 Jun 2022 • Muhammad Maaz, Abdelrahman Shaker, Hisham Cholakkal, Salman Khan, Syed Waqas Zamir, Rao Muhammad Anwer, Fahad Shahbaz Khan
Our EdgeNeXt model with 1. 3M parameters achieves 71. 2% top-1 accuracy on ImageNet-1K, outperforming MobileViT with an absolute gain of 2. 2% with 28% reduction in FLOPs.
Ranked #29 on
Semantic Segmentation
on PASCAL VOC 2012 test
2 code implementations • 11 May 2022 • Yawei Li, Kai Zhang, Radu Timofte, Luc van Gool, Fangyuan Kong, Mingxi Li, Songwei Liu, Zongcai Du, Ding Liu, Chenhui Zhou, Jingyi Chen, Qingrui Han, Zheyuan Li, Yingqi Liu, Xiangyu Chen, Haoming Cai, Yu Qiao, Chao Dong, Long Sun, Jinshan Pan, Yi Zhu, Zhikai Zong, Xiaoxiao Liu, Zheng Hui, Tao Yang, Peiran Ren, Xuansong Xie, Xian-Sheng Hua, Yanbo Wang, Xiaozhong Ji, Chuming Lin, Donghao Luo, Ying Tai, Chengjie Wang, Zhizhong Zhang, Yuan Xie, Shen Cheng, Ziwei Luo, Lei Yu, Zhihong Wen, Qi Wu1, Youwei Li, Haoqiang Fan, Jian Sun, Shuaicheng Liu, Yuanfei Huang, Meiguang Jin, Hua Huang, Jing Liu, Xinjian Zhang, Yan Wang, Lingshun Long, Gen Li, Yuanfan Zhang, Zuowei Cao, Lei Sun, Panaetov Alexander, Yucong Wang, Minjie Cai, Li Wang, Lu Tian, Zheyuan Wang, Hongbing Ma, Jie Liu, Chao Chen, Yidong Cai, Jie Tang, Gangshan Wu, Weiran Wang, Shirui Huang, Honglei Lu, Huan Liu, Keyan Wang, Jun Chen, Shi Chen, Yuchun Miao, Zimo Huang, Lefei Zhang, Mustafa Ayazoğlu, Wei Xiong, Chengyi Xiong, Fei Wang, Hao Li, Ruimian Wen, Zhijing Yang, Wenbin Zou, Weixin Zheng, Tian Ye, Yuncheng Zhang, Xiangzhen Kong, Aditya Arora, Syed Waqas Zamir, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Dandan Gaoand Dengwen Zhouand Qian Ning, Jingzhu Tang, Han Huang, YuFei Wang, Zhangheng Peng, Haobo Li, Wenxue Guan, Shenghua Gong, Xin Li, Jun Liu, Wanjun Wang, Dengwen Zhou, Kun Zeng, Hanjiang Lin, Xinyu Chen, Jinsheng Fang
The aim was to design a network for single image super-resolution that achieved improvement of efficiency measured according to several metrics including runtime, parameters, FLOPs, activations, and memory consumption while at least maintaining the PSNR of 29. 00dB on DIV2K validation set.
no code implementations • 22 Apr 2022 • Jyoti Kini, Fahad Shahbaz Khan, Salman Khan, Mubarak Shah
We propose a novel self-supervised Video Object Segmentation (VOS) approach that strives to achieve better object-background discriminability for accurate object segmentation.
1 code implementation • 19 Apr 2022 • Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao
In the former case, spatial details are preserved but the contextual information cannot be precisely encoded.
2 code implementations • CVPR 2022 • K J Joseph, Salman Khan, Fahad Shahbaz Khan, Rao Muhammad Anwer, Vineeth N Balasubramanian
Deep learning models tend to forget their earlier knowledge while incrementally learning new tasks.
1 code implementation • 24 Mar 2022 • Omkar Thawakar, Sanath Narayan, Jiale Cao, Hisham Cholakkal, Rao Muhammad Anwer, Muhammad Haris Khan, Salman Khan, Michael Felsberg, Fahad Shahbaz Khan
When using the ResNet50 backbone, our MS-STS achieves a mask AP of 50. 1 %, outperforming the best reported results in literature by 2. 7 % and by 4. 8 % at higher overlap threshold of AP_75, while being comparable in model size and speed on Youtube-VIS 2019 val.
1 code implementation • 24 Jan 2022 • Fahad Shamshad, Salman Khan, Syed Waqas Zamir, Muhammad Haris Khan, Munawar Hayat, Fahad Shahbaz Khan, Huazhu Fu
Following unprecedented success on the natural language tasks, Transformers have been successfully applied to several computer vision problems, achieving state-of-the-art results and prompting researchers to reconsider the supremacy of convolutional neural networks (CNNs) as {de facto} operators.
no code implementations • 10 Jan 2022 • Izzeddin Teeti, Valentina Musat, Salman Khan, Alexander Rast, Fabio Cuzzolin, Andrew Bradley
In an autonomous driving system, perception - identification of features and objects from the environment - is crucial.
1 code implementation • 7 Jan 2022 • Taimur Hassan, Samet Akcay, Mohammed Bennamoun, Salman Khan, Naoufel Werghi
Screening cluttered and occluded contraband items from baggage X-ray scans is a cumbersome task even for the expert security staff.
1 code implementation • CVPR 2022 • Anirudh Thatipelli, Sanath Narayan, Salman Khan, Rao Muhammad Anwer, Fahad Shahbaz Khan, Bernard Ghanem
Experiments are performed on four few-shot action recognition benchmarks: Kinetics, SSv2, HMDB51 and UCF101.
Ranked #1 on
Few Shot Action Recognition
on HMDB51
1 code implementation • 6 Dec 2021 • Ankan Kumar Bhunia, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan, Jorma Laaksonen, Michael Felsberg
Creative sketch image generation is a challenging vision problem, where the task is to generate diverse, yet realistic creative sketches possessing the unseen composition of the visual-world objects.
2 code implementations • CVPR 2022 • Akshita Gupta, Sanath Narayan, K J Joseph, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah
In the case of incremental object detection, OW-DETR outperforms the state-of-the-art for all settings on PASCAL VOC.
1 code implementation • CVPR 2022 • Kanchana Ranasinghe, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan, Michael Ryoo
To the best of our knowledge, the proposed approach is the first to alleviate the dependency on negative samples or dedicated memory banks in Self-supervised Video Transformer (SVT).
Ranked #52 on
Action Recognition
on UCF101
1 code implementation • 22 Nov 2021 • Muhammad Maaz, Hanoona Rasheed, Salman Khan, Fahad Shahbaz Khan, Rao Muhammad Anwer, Ming-Hsuan Yang
This has been a long-standing question in computer vision.
Ranked #1 on
Class-agnostic Object Detection
on COCO
11 code implementations • CVPR 2022 • Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang
Since convolutional neural networks (CNNs) perform well at learning generalizable image priors from large-scale data, these models have been extensively applied to image restoration and related tasks.
Ranked #1 on
Grayscale Image Denoising
on Urban100 sigma15
no code implementations • 27 Oct 2021 • Ajmal Shahbaz, Salman Khan, Mohammad Asiful Hossain, Vincenzo Lomonaco, Kevin Cannons, Zhan Xu, Fabio Cuzzolin
The aim of this paper is to formalize a new continual semi-supervised learning (CSSL) paradigm, proposed to the attention of the machine learning community via the IJCAI 2021 International Workshop on Continual Semi-Supervised Learning (CSSL-IJCAI), with the aim of raising field awareness about this problem and mobilizing its effort in this direction.
1 code implementation • CVPR 2022 • Akshay Dudhane, Syed Waqas Zamir, Salman Khan, Fahad Shahbaz Khan, Ming-Hsuan Yang
Our central idea is to create a set of pseudo-burst features that combine complementary information from all the input burst frames to seamlessly exchange information.
Ranked #2 on
Burst Image Super-Resolution
on BurstSR
1 code implementation • 22 Aug 2021 • Taimur Hassan, Samet Akcay, Mohammed Bennamoun, Salman Khan, Naoufel Werghi
Furthermore, to the best of our knowledge, this is the first contour instance segmentation framework that leverages multi-scale information to recognize cluttered and concealed contraband data from the colored and grayscale security X-ray imagery.
1 code implementation • ICCV 2021 • Sanath Narayan, Akshita Gupta, Salman Khan, Fahad Shahbaz Khan, Ling Shao, Mubarak Shah
We note that the best existing multi-label ZSL method takes a shared approach towards attending to region features with a common set of attention maps for all the classes.
Ranked #2 on
Multi-label zero-shot learning
on Open Images V4
no code implementations • 23 Jul 2021 • Mohammed Hassanin, Ibrahim Radwan, Salman Khan, Murat Tahtali
Multi-label recognition is a fundamental, and yet is a challenging task in computer vision.
1 code implementation • 15 Jul 2021 • Taimur Hassan, Samet Akcay, Mohammed Bennamoun, Salman Khan, Naoufel Werghi
Identifying potential threats concealed within the baggage is of prime concern for the security staff.
2 code implementations • ICLR 2022 • Muzammal Naseer, Kanchana Ranasinghe, Salman Khan, Fahad Shahbaz Khan, Fatih Porikli
(ii) Token Refinement: We then propose to refine the tokens to further enhance the discriminative capacity at each block of ViT.
1 code implementation • NeurIPS 2021 • Muzammal Naseer, Kanchana Ranasinghe, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang
We show and analyze the following intriguing properties of ViT: (a) Transformers are highly robust to severe occlusions, perturbations and domain shifts, e. g., retain as high as 60% top-1 accuracy on ImageNet even after randomly occluding 80% of the image content.
1 code implementation • 26 Apr 2021 • Mohamed Afham, Salman Khan, Muhammad Haris Khan, Muzammal Naseer, Fahad Shahbaz Khan
Human learning benefits from multi-modal inputs that often appear as rich semantics (e. g., description of an object's attributes while learning about it).
Ranked #1 on
Few-Shot Image Classification
on Oxford 102 Flower
(using extra training data)
no code implementations • 16 Apr 2021 • Salman Khan, Fabio Cuzzolin
We also contribute fresh temporal complex activity annotation for the recently released ROAD autonomous driving and SARAS-ESAD surgical action datasets and show the adaptability of our framework to different domains.
1 code implementation • ICCV 2021 • Ankan Kumar Bhunia, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan, Mubarak Shah
We propose a novel transformer-based styled handwritten text image generation approach, HWT, that strives to learn both style-content entanglement as well as global and local writing style patterns.
3 code implementations • ICCV 2021 • Muzammal Naseer, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Fatih Porikli
To this end, we propose a new objective function that not only aligns the global distributions of source and target images, but also matches the local neighbourhood structure between the two domains.
1 code implementation • ICCV 2021 • Kanchana Ranasinghe, Muzammal Naseer, Munawar Hayat, Salman Khan, Fahad Shahbaz Khan
The CE loss encourages features of a class to have a higher projection score on the true class-vector compared to the negative classes.
2 code implementations • CVPR 2021 • K J Joseph, Salman Khan, Fahad Shahbaz Khan, Vineeth N Balasubramanian
Humans have a natural instinct to identify unknown object instances in their environments.
1 code implementation • CVPR 2021 • Mamshad Nayeem Rizve, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah
Equivariance or invariance has been employed standalone in the previous works; however, to the best of our knowledge, they have not been used jointly.
2 code implementations • 23 Feb 2021 • Gurkirt Singh, Stephen Akrigg, Manuele Di Maio, Valentina Fontana, Reza Javanmard Alitappeh, Suman Saha, Kossar Jeddisaravi, Farzad Yousefi, Jacob Culley, Tom Nicholson, Jordan Omokeowa, Salman Khan, Stanislao Grazioso, Andrew Bradley, Giuseppe Di Gironimo, Fabio Cuzzolin
We also report the performance on the ROAD tasks of Slowfast and YOLOv5 detectors, as well as that of the winners of the ICCV2021 ROAD challenge, which highlight the challenges faced by situation awareness in autonomous driving.
no code implementations • 6 Feb 2021 • Sameera Ramasinghe, Kasun Fernando, Salman Khan, Nick Barnes
Modeling real-world distributions can often be challenging due to sample data that are subjected to perturbations, e. g., instrumentation errors, or added random noise.
7 code implementations • CVPR 2021 • Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao
At each stage, we introduce a novel per-pixel adaptive design that leverages in-situ supervised attention to reweight the local features.
Ranked #4 on
Deblurring
on RSBlur
1 code implementation • 27 Jan 2021 • Akshita Gupta, Sanath Narayan, Salman Khan, Fahad Shahbaz Khan, Ling Shao, Joost Van de Weijer
Nevertheless, computing reliable attention maps for unseen classes during inference in a multi-label setting is still a challenge.
Ranked #8 on
Multi-label zero-shot learning
on NUS-WIDE
no code implementations • 4 Jan 2021 • Salman Khan, Muzammal Naseer, Munawar Hayat, Syed Waqas Zamir, Fahad Shahbaz Khan, Mubarak Shah
Astounding results from Transformer models on natural language tasks have intrigued the vision community to study their application to computer vision problems.
1 code implementation • NeurIPS 2021 • Sameera Ramasinghe, Moshiur Farazi, Salman Khan, Nick Barnes, Stephen Gould
Conditional GANs (cGAN), in their rudimentary form, suffer from critical drawbacks such as the lack of diversity in generated outputs and distortion between the latent and output manifolds.
no code implementations • 19 Oct 2020 • Jathushan Rajasegaran, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Mubarak Shah
This demonstrates their ability to acquire transferable knowledge, a capability that is central to human learning.
2 code implementations • 19 Oct 2020 • Nasir Hayat, Munawar Hayat, Shafin Rahman, Salman Khan, Syed Waqas Zamir, Fahad Shahbaz Khan
The existing zero-shot detection approaches project visual features to the semantic domain for seen objects, hoping to map unseen objects to their corresponding semantics during inference.
Ranked #1 on
Zero-Shot Object Detection
on ImageNet Detection
Generalized Zero-Shot Object Detection
Zero-Shot Object Detection
no code implementations • NeurIPS Workshop SVRHM 2020 • Salman Khan, Alexander Wong, Bryan P. Tripp
Under difficult viewing conditions, the brain's visual system uses a variety of modulatory techniques to augment its core feed-forward signals.
no code implementations • ICLR 2021 • Sameera Ramasinghe, Kanchana Ranasinghe, Salman Khan, Nick Barnes, Stephen Gould
Although deep learning has achieved appealing results on several machine learning tasks, most of the models are deterministic at inference, limiting their application to single-modal settings.
no code implementations • 5 Oct 2020 • Moshiur Farazi, Salman Khan, Nick Barnes
Humans explain inter-object relationships with semantic labels that demonstrate a high-level understanding required to perform complex Vision-Language tasks such as Visual Question Answering (VQA).
1 code implementation • 28 Sep 2020 • Taimur Hassan, Samet Akcay, Mohammed Bennamoun, Salman Khan, Naoufel Werghi
Detecting baggage threats is one of the most difficult tasks, even for expert officers.
no code implementations • 25 Sep 2020 • Pengxu Wei, Hannan Lu, Radu Timofte, Liang Lin, WangMeng Zuo, Zhihong Pan, Baopu Li, Teng Xi, Yanwen Fan, Gang Zhang, Jingtuo Liu, Junyu Han, Errui Ding, Tangxin Xie, Liang Cao, Yan Zou, Yi Shen, Jialiang Zhang, Yu Jia, Kaihua Cheng, Chenhuan Wu, Yue Lin, Cen Liu, Yunbo Peng, Xueyi Zou, Zhipeng Luo, Yuehan Yao, Zhenyu Xu, Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Tongtong Zhao, Shanshan Zhao, Yoseob Han, Byung-Hoon Kim, JaeHyun Baek, HaoNing Wu, Dejia Xu, Bo Zhou, Wei Guan, Xiaobo Li, Chen Ye, Hao Li, Yukai Shi, Zhijing Yang, Xiaojun Yang, Haoyu Zhong, Xin Li, Xin Jin, Yaojun Wu, Yingxue Pang, Sen Liu, Zhi-Song Liu, Li-Wen Wang, Chu-Tak Li, Marie-Paule Cani, Wan-Chi Siu, Yuanbo Zhou, Rao Muhammad Umer, Christian Micheloni, Xiaofeng Cong, Rajat Gupta, Keon-Hee Ahn, Jun-Hyuk Kim, Jun-Ho Choi, Jong-Seok Lee, Feras Almasri, Thomas Vandamme, Olivier Debeir
This paper introduces the real image Super-Resolution (SR) challenge that was part of the Advances in Image Manipulation (AIM) workshop, held in conjunction with ECCV 2020.
1 code implementation • 29 Jul 2020 • Muzammal Naseer, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Fatih Porikli
In contrast to existing adversarial training methods that only use class-boundary information (e. g., using a cross-entropy loss), we propose to exploit additional information from the feature space to craft stronger adversaries that are in turn used to learn a robust model.
1 code implementation • 17 Jun 2020 • Jathushan Rajasegaran, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Mubarak Shah
Our experiments show that, even in the first stage, self-supervision can outperform current state-of-the-art methods, with further gains achieved by our second stage distillation process.
Ranked #12 on
Few-Shot Image Classification
on FC100 5-way (5-shot)
2 code implementations • CVPR 2020 • Muzammal Naseer, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Fatih Porikli
Adversarial examples can cause catastrophic mistakes in Deep Neural Network (DNNs) based vision systems e. g., for classification, segmentation and object detection.
no code implementations • 14 Apr 2020 • Taimur Hassan, Samet Akcay, Mohammed Bennamoun, Salman Khan, Naoufel Werghi
In the last two decades, baggage scanning has globally become one of the prime aviation security concerns.
1 code implementation • CVPR 2020 • Yaxing Wang, Salman Khan, Abel Gonzalez-Garcia, Joost Van de Weijer, Fahad Shahbaz Khan
In this work, we go one step further and reduce the amount of required labeled data also from the source domain during training.
1 code implementation • CVPR 2020 • Jathushan Rajasegaran, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Mubarak Shah
In this paper, we hypothesize this problem can be avoided by learning a set of generalized parameters, that are neither specific to old nor new tasks.
2 code implementations • 17 Mar 2020 • K J Joseph, Jathushan Rajasegaran, Salman Khan, Fahad Shahbaz Khan, Vineeth N Balasubramanian
In a real-world setting, object instances from new classes can be continuously encountered by object detectors.
8 code implementations • CVPR 2020 • Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao
This is mainly because the AWGN is not adequate for modeling the real camera noise which is signal-dependent and heavily transformed by the camera imaging pipeline.
Ranked #9 on
Image Denoising
on DND
(using extra training data)
no code implementations • 16 Mar 2020 • Shafin Rahman, Salman Khan, Nick Barnes, Fahad Shahbaz Khan
Any-shot detection offers unique challenges compared to conventional novel object detection such as, a high imbalance between unseen, few-shot and seen object classes, susceptibility to forget base-training while learning novel classes and distinguishing novel classes from the background.
12 code implementations • ECCV 2020 • Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao
With the goal of recovering high-quality image content from its degraded version, image restoration enjoys numerous applications, such as in surveillance, computational photography, medical imaging, and remote sensing.
Ranked #4 on
Image Denoising
on DND
no code implementations • 17 Jan 2020 • Akif Quddus Khan, Salman Khan
Farm animal behavior analysis is a crucial tasks for the industrial farming.
no code implementations • 14 Dec 2019 • Guolei Sun, Hisham Cholakkal, Salman Khan, Fahad Shahbaz Khan, Ling Shao
The main requisite for fine-grained recognition task is to focus on subtle discriminative details that make the subordinate classes different from each other.
Ranked #14 on
Fine-Grained Image Classification
on Stanford Dogs
1 code implementation • 13 Dec 2019 • Hisham Cholakkal, Guolei Sun, Salman Khan, Fahad Shahbaz Khan, Ling Shao, Luc van Gool
Our RLC framework further reduces the annotation cost arising from large numbers of object categories in a dataset by only using lower-count supervision for a subset of categories and class-labels for the remaining ones.
Image Classification
Image-level Supervised Instance Segmentation
+2
1 code implementation • 4 Dec 2019 • Sameera Ramasinghe, Salman Khan, Nick Barnes, Stephen Gould
Point-clouds are a popular choice for vision and graphics tasks due to their accurate shape description and direct acquisition from range-scanners.
no code implementations • 30 Nov 2019 • Sameera Ramasinghe, Salman Khan, Nick Barnes, Stephen Gould
In this work, we propose a novel `\emph{volumetric convolution}' operation that can effectively model and convolve arbitrary functions in $\mathbb{B}^3$.
1 code implementation • CVPR 2020 • Muhammad Haris Khan, John McDonagh, Salman Khan, Muhammad Shahabuddin, Aditya Arora, Fahad Shahbaz Khan, Ling Shao, Georgios Tzimiropoulos
Several studies show that animal needs are often expressed through their faces.
no code implementations • 24 Aug 2019 • Sameera Ramasinghe, Salman Khan, Nick Barnes, Stephen Gould
Existing networks directly learn feature representations on 3D point clouds for shape analysis.
no code implementations • 20 Jun 2019 • Qiuxia Lai, Salman Khan, Yongwei Nie, Jianbing Shen, Hanqiu Sun, Ling Shao
With three example computer vision tasks, diverse representative backbones, and famous architectures, corresponding real human gaze data, and systematically conducted large-scale quantitative studies, we quantify the consistency between artificial attention and human visual attention and offer novel insights into existing artificial attention mechanisms by giving preliminary answers to several key questions related to human and artificial attention mechanisms.
1 code implementation • 4 Jun 2019 • Guodong Ding, Salman Khan, Zhenmin Tang, Jian Zhang, Fatih Porikli
With this insight, we design a novel Dispersion-based Clustering (DBC) approach which can discover the underlying patterns in data.
Ranked #15 on
Unsupervised Person Re-Identification
on Market-1501
1 code implementation • 3 Jun 2019 • Jathushan Rajasegaran, Munawar Hayat, Salman Khan, Fahad Shahbaz Khan, Ling Shao, Ming-Hsuan Yang
In a conventional supervised learning setting, a machine learning model has access to examples of all object classes that are desired to be recognized during the inference stage.
3 code implementations • 30 May 2019 • Syed Waqas Zamir, Aditya Arora, Akshita Gupta, Salman Khan, Guolei Sun, Fahad Shahbaz Khan, Fan Zhu, Ling Shao, Gui-Song Xia, Xiang Bai
Compared to existing small-scale aerial image based instance segmentation datasets, iSAID contains 15$\times$ the number of object categories and 5$\times$ the number of instances.
Ranked #1 on
Object Detection
on iSAID
2 code implementations • 16 Apr 2019 • Saeed Anwar, Salman Khan, Nick Barnes
Deep convolutional networks based super-resolution is a fast-growing field with numerous practical applications.
no code implementations • 11 Apr 2019 • Syed Waqas Zamir, Aditya Arora, Salman Khan, Fahad Shahbaz Khan, Ling Shao
In low-light conditions, a conventional camera imaging pipeline produces sub-optimal images that are usually dark and noisy due to a low photon count and low signal-to-noise ratio (SNR).
1 code implementation • ICCV 2019 • Aamir Mustafa, Salman Khan, Munawar Hayat, Roland Goecke, Jianbing Shen, Ling Shao
Deep neural networks are vulnerable to adversarial attacks, which can fool them by adding minuscule perturbations to the input images.
Ranked #5 on
Adversarial Defense
on CIFAR-10
1 code implementation • 23 Jan 2019 • Munawar Hayat, Salman Khan, Waqas Zamir, Jianbing Shen, Ling Shao
Real-world object classes appear in imbalanced ratios.
no code implementations • CVPR 2019 • Salman Khan, Munawar Hayat, Waqas Zamir, Jianbing Shen, Ling Shao
Rare classes tend to get a concentrated representation in the classification space which hampers the generalization of learned boundaries to new test examples.
no code implementations • ICLR 2019 • Sameera Ramasinghe, Salman Khan, Nick Barnes
Convolution is an efficient technique to obtain abstract feature representations using hierarchical layers in deep networks.
3 code implementations • 22 Nov 2018 • Shafin Rahman, Salman Khan, Nick Barnes
This setting gives rise to the need for correct alignment between visual and semantic concepts, so that the unseen objects can be identified using only their semantic attributes.
Ranked #4 on
Zero-Shot Object Detection
on PASCAL VOC'07
no code implementations • 15 Oct 2018 • Sameera Ramasinghe, C. D. Athuralya, Salman Khan
Recently proposed Capsule Network is a brain inspired architecture that brings a new paradigm to deep learning by modelling input domain variations through vector based representations.
no code implementations • 18 Jul 2018 • Mohammed Hassanin, Salman Khan, Murat Tahtali
Nowadays, robots are dominating the manufacturing, entertainment and healthcare industries.
no code implementations • 16 May 2018 • Guodong Ding, Shanshan Zhang, Salman Khan, Zhenmin Tang, Jian Zhang, Fatih Porikli
Our approach measures the affinity of unlabeled samples with the underlying clusters of labeled data samples using the intermediate feature representations from deep networks.
1 code implementation • 16 Mar 2018 • Shafin Rahman, Salman Khan
In-line with the success of deep learning on traditional recognition problem, several end-to-end deep models for zero-shot recognition have been proposed in the literature.
1 code implementation • 16 Mar 2018 • Shafin Rahman, Salman Khan, Fatih Porikli
We hypothesize that this setting is ill-suited for real-world applications where unseen objects appear only as a part of a complex scene, warranting both the `recognition' and `localization' of an unseen category.
no code implementations • 23 Nov 2017 • Salman Khan, Munawar Hayat, Fatih Porikli
We cast the proposed approach in the form of regular Convolutional Neural Network (CNN) weight layers using a decorrelation transform with fixed basis functions.
no code implementations • 20 Nov 2017 • Guodong Ding, Salman Khan, Zhenmin Tang, Fatih Porikli
Person re-identification aims at establishing the identity of a pedestrian from a gallery that contains images of multiple people obtained from a multi-camera system.