…Our findings reveal that state-of-the-art pre-trained multi-modal models (e.g., PaLI-X, BLIP2, etc.) face challenges in answering visual information-seeking questions, but fine-tuning on the InfoSeek dataset
17 PAPERS • 2 BENCHMARKS
…(bachelor_of_arts, juris_doctor).
204 PAPERS • 3 BENCHMARKS