1 code implementation • 8 Jun 2023 • Muhammad Maaz, Hanoona Rasheed, Salman Khan, Fahad Shahbaz Khan
Conversation agents fueled by Large Language Models (LLMs) are providing a new way to interact with visual data.
Ranked #1 on
Video-based Generative Performance Benchmarking (Temporal Understanding)
on VideoInstruct
Video-based Generative Performance Benchmarking (Consistency)
Video-based Generative Performance Benchmarking (Contextual Understanding)
+5
2 code implementations • ICCV 2023 • Abdelrahman Shaker, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan
Using our proposed efficient additive attention, we build a series of models called "SwiftFormer" which achieves state-of-the-art performance in terms of both accuracy and mobile inference speed.
2 code implementations • 8 Dec 2022 • Abdelrahman Shaker, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan
Owing to the success of transformer models, recent works study their applicability in 3D medical segmentation tasks.
1 code implementation • CVPR 2023 • Hanoona Rasheed, Muhammad Uzair Khattak, Muhammad Maaz, Salman Khan, Fahad Shahbaz Khan
Since training on a similar scale for videos is infeasible, recent approaches focus on the effective transfer of image-based CLIP to the video domain.
2 code implementations • CVPR 2023 • Muhammad Uzair Khattak, Hanoona Rasheed, Muhammad Maaz, Salman Khan, Fahad Shahbaz Khan
Pre-trained vision-language (V-L) models such as CLIP have shown excellent generalization ability to downstream tasks.
Ranked #1 on
Prompt Engineering
on FGVC-Aircraft
1 code implementation • 7 Jul 2022 • Hanoona Rasheed, Muhammad Maaz, Muhammad Uzair Khattak, Salman Khan, Fahad Shahbaz Khan
Two popular forms of weak-supervision used in open-vocabulary detection (OVD) include pretrained CLIP model and image-level supervision.
Ranked #1 on
Open Vocabulary Object Detection
on OpenImages-v4
Open Vocabulary Attribute Detection
Zero-Shot Object Detection
1 code implementation • 22 Nov 2021 • Muhammad Maaz, Hanoona Rasheed, Salman Khan, Fahad Shahbaz Khan, Rao Muhammad Anwer, Ming-Hsuan Yang
This has been a long-standing question in computer vision.
Ranked #1 on
Class-agnostic Object Detection
on COCO