LLaVA-Bench is a dataset created to evaluate the capability of large multimodal models (LMM) in more challenging tasks and generalizability to novel domains. It consists of a diverse set of 24 images with 60 questions in total, including indoor and outdoor scenes, memes, paintings, sketches, etc., and each image with a highly-detailed and manually-curated description and a proper selection of questions. The dataset is part of the LLaVA project, which aims to develop multimodal chatbots that follow human intents to complete various daily-life visual tasks in the wild.
Paper | Code | Results | Date | Stars |
---|