2 dataset results for face recog AND Visual Question Answering (VQA)

…PaLI-X, BLIP2, etc.) face challenges in answering visual information-seeking questions, but fine-tuning on the InfoSeek dataset elicits models to use fine-grained knowledge that was learned during their

17 PAPERS • 2 BENCHMARKS

BenchLMM (BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models)

…prompting LMMs to predict the style first, based on which we propose a versatile and training-free method for improving LMMs; 4) An intelligent LMM is expected to interpret the causes of its errors when facing

9 PAPERS • 1 BENCHMARK

Datasets

2 dataset results for face recog AND Visual Question Answering (VQA)