Search Results for author: Hanoona Rasheed

Found 9 papers, 9 papers with code

PALO: A Polyglot Large Multimodal Model for 5B People

1 code implementation22 Feb 2024 Muhammad Maaz, Hanoona Rasheed, Abdelrahman Shaker, Salman Khan, Hisham Cholakal, Rao M. Anwer, Tim Baldwin, Michael Felsberg, Fahad S. Khan

PALO offers visual reasoning capabilities in 10 major languages, including English, Chinese, Hindi, Spanish, French, Arabic, Bengali, Russian, Urdu, and Japanese, that span a total of ~5B people (65% of the world population).

Language Modelling Large Language Model +1

GLaMM: Pixel Grounding Large Multimodal Model

1 code implementation6 Nov 2023 Hanoona Rasheed, Muhammad Maaz, Sahal Shaji Mullappilly, Abdelrahman Shaker, Salman Khan, Hisham Cholakkal, Rao M. Anwer, Erix Xing, Ming-Hsuan Yang, Fahad S. Khan

In this work, we present Grounding LMM (GLaMM), the first model that can generate natural language responses seamlessly intertwined with corresponding object segmentation masks.

Conversational Question Answering Image Captioning +5

SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications

2 code implementations ICCV 2023 Abdelrahman Shaker, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan

Using our proposed efficient additive attention, we build a series of models called "SwiftFormer" which achieves state-of-the-art performance in terms of both accuracy and mobile inference speed.

Fine-tuned CLIP Models are Efficient Video Learners

1 code implementation CVPR 2023 Hanoona Rasheed, Muhammad Uzair Khattak, Muhammad Maaz, Salman Khan, Fahad Shahbaz Khan

Since training on a similar scale for videos is infeasible, recent approaches focus on the effective transfer of image-based CLIP to the video domain.

MaPLe: Multi-modal Prompt Learning

2 code implementations CVPR 2023 Muhammad Uzair Khattak, Hanoona Rasheed, Muhammad Maaz, Salman Khan, Fahad Shahbaz Khan

Pre-trained vision-language (V-L) models such as CLIP have shown excellent generalization ability to downstream tasks.

Prompt Engineering

Cannot find the paper you are looking for? You can Submit a new open access paper.