1 code implementation • 27 Jan 2025 • Muhammad Maaz, Timothy C. Y. Chan
We introduce the problem of formally verifying properties of Markov processes where the parameters are the output of machine learning models.
1 code implementation • 13 Jun 2024 • Muhammad Maaz, Hanoona Rasheed, Salman Khan, Fahad Khan
Building on the advances of language models, Large Multimodal Models (LMMs) have contributed significant improvements in video understanding.
Ranked #1 on
VCGBench-Diverse
on VideoInstruct
1 code implementation • 22 Feb 2024 • Muhammad Maaz, Hanoona Rasheed, Abdelrahman Shaker, Salman Khan, Hisham Cholakal, Rao M. Anwer, Tim Baldwin, Michael Felsberg, Fahad S. Khan
PALO offers visual reasoning capabilities in 10 major languages, including English, Chinese, Hindi, Spanish, French, Arabic, Bengali, Russian, Urdu, and Japanese, that span a total of ~5B people (65% of the world population).
1 code implementation • 22 Nov 2023 • Shehan Munasinghe, Rusiru Thushara, Muhammad Maaz, Hanoona Abdul Rasheed, Salman Khan, Mubarak Shah, Fahad Khan
Extending image-based Large Multimodal Models (LMMs) to videos is challenging due to the inherent complexity of video data.
1 code implementation • CVPR 2024 • Hanoona Rasheed, Muhammad Maaz, Sahal Shaji Mullappilly, Abdelrahman Shaker, Salman Khan, Hisham Cholakkal, Rao M. Anwer, Erix Xing, Ming-Hsuan Yang, Fahad S. Khan
In this work, we present Grounding LMM (GLaMM), the first model that can generate natural language responses seamlessly intertwined with corresponding object segmentation masks.
1 code implementation • 16 Jun 2023 • Muhammad Maaz, Rui Qiao, Yiheng Zhou, Renxian Zhang
We conduct numerous experiments on well-known NLP data sets and rigorously explore the performance of different score functions.
2 code implementations • 8 Jun 2023 • Muhammad Maaz, Hanoona Rasheed, Salman Khan, Fahad Shahbaz Khan
Conversation agents fueled by Large Language Models (LLMs) are providing a new way to interact with visual data.
Ranked #5 on
Question Answering
on NExT-QA (Open-ended VideoQA)
VCGBench-Diverse
Video-based Generative Performance Benchmarking (Consistency)
+6
5 code implementations • ICCV 2023 • Abdelrahman Shaker, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan
Using our proposed efficient additive attention, we build a series of models called "SwiftFormer" which achieves state-of-the-art performance in terms of both accuracy and mobile inference speed.
2 code implementations • 8 Dec 2022 • Abdelrahman Shaker, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan
Owing to the success of transformer models, recent works study their applicability in 3D medical segmentation tasks.
1 code implementation • CVPR 2023 • Hanoona Rasheed, Muhammad Uzair Khattak, Muhammad Maaz, Salman Khan, Fahad Shahbaz Khan
Since training on a similar scale for videos is infeasible, recent approaches focus on the effective transfer of image-based CLIP to the video domain.
3 code implementations • CVPR 2023 • Muhammad Uzair Khattak, Hanoona Rasheed, Muhammad Maaz, Salman Khan, Fahad Shahbaz Khan
Pre-trained vision-language (V-L) models such as CLIP have shown excellent generalization ability to downstream tasks.
Ranked #3 on
Prompt Engineering
on Food-101
1 code implementation • 7 Jul 2022 • Hanoona Rasheed, Muhammad Maaz, Muhammad Uzair Khattak, Salman Khan, Fahad Shahbaz Khan
Two popular forms of weak-supervision used in open-vocabulary detection (OVD) include pretrained CLIP model and image-level supervision.
Ranked #1 on
Open Vocabulary Object Detection
on OpenImages-v4
8 code implementations • 21 Jun 2022 • Muhammad Maaz, Abdelrahman Shaker, Hisham Cholakkal, Salman Khan, Syed Waqas Zamir, Rao Muhammad Anwer, Fahad Shahbaz Khan
Our EdgeNeXt model with 1. 3M parameters achieves 71. 2% top-1 accuracy on ImageNet-1K, outperforming MobileViT with an absolute gain of 2. 2% with 28% reduction in FLOPs.
Ranked #29 on
Semantic Segmentation
on PASCAL VOC 2012 test
1 code implementation • 22 Nov 2021 • Muhammad Maaz, Hanoona Rasheed, Salman Khan, Fahad Shahbaz Khan, Rao Muhammad Anwer, Ming-Hsuan Yang
This has been a long-standing question in computer vision.
1 code implementation • 18 May 2021 • Muhammad Maaz, Hanoona Abdul Rasheed, Dhanalaxmi Gaddam
The deconstruction learning forces the model to focus on local object parts, while reconstruction learning helps in learning the correlation between the parts.
Fine-Grained Visual Categorization
Representation Learning
+1
no code implementations • 22 Aug 2019 • Muhammad Maaz
This shows that machine learning has the potential to significantly revolutionize the abstract screening process in healthcare systematic reviews.