2 code implementations • 6 Mar 2024 • Deepanway Ghosal, Vernon Toh Yan Han, Chia Yew Ken, Soujanya Poria
We present a new dataset, AlgoPuzzleVQA designed to challenge and evaluate the capabilities of multimodal language models in solving algorithmic puzzles that necessitate both visual understanding, language understanding, and complex algorithmic reasoning.
Ranked #1 on Multimodal Reasoning on AlgoPuzzleVQA