Search Results for author: Fahad S. Khan

Found 3 papers, 2 papers with code

PALO: A Polyglot Large Multimodal Model for 5B People

1 code implementation22 Feb 2024 Muhammad Maaz, Hanoona Rasheed, Abdelrahman Shaker, Salman Khan, Hisham Cholakal, Rao M. Anwer, Tim Baldwin, Michael Felsberg, Fahad S. Khan

PALO offers visual reasoning capabilities in 10 major languages, including English, Chinese, Hindi, Spanish, French, Arabic, Bengali, Russian, Urdu, and Japanese, that span a total of ~5B people (65% of the world population).

Language Modelling Large Language Model +1

GLaMM: Pixel Grounding Large Multimodal Model

1 code implementation6 Nov 2023 Hanoona Rasheed, Muhammad Maaz, Sahal Shaji Mullappilly, Abdelrahman Shaker, Salman Khan, Hisham Cholakkal, Rao M. Anwer, Erix Xing, Ming-Hsuan Yang, Fahad S. Khan

In this work, we present Grounding LMM (GLaMM), the first model that can generate natural language responses seamlessly intertwined with corresponding object segmentation masks.

Conversational Question Answering Image Captioning +5

Cannot find the paper you are looking for? You can Submit a new open access paper.