no code implementations • 23 May 2025 • Alex L. Zhang, Thomas L. Griffiths, Karthik R. Narasimhan, Ofir Press
To this end, we introduce VideoGameBench, a benchmark consisting of 10 popular video games from the 1990s that VLMs directly interact with in real-time.
no code implementations • 20 May 2025 • Devansh Bhardwaj, Arjun Beniwal, Shreyas Chaudhari, Ashwin Kalyan, Tanmay Rajpurohit, Karthik R. Narasimhan, Ameet Deshpande, Vishvak Murahari
AI agents have become increasingly adept at complex tasks such as coding, reasoning, and multimodal understanding.
3 code implementations • 4 Oct 2024 • John Yang, Carlos E. Jimenez, Alex L. Zhang, Kilian Lieret, Joyce Yang, Xindi Wu, Ori Press, Niklas Muennighoff, Gabriel Synnaeve, Karthik R. Narasimhan, Diyi Yang, Sida I. Wang, Ofir Press
Therefore, we propose SWE-bench Multimodal (SWE-bench M), to evaluate systems on their ability to fix bugs in visual, user-facing JavaScript software.
4 code implementations • NeurIPS 2016 • Tejas D. Kulkarni, Karthik R. Narasimhan, Ardavan Saeedi, Joshua B. Tenenbaum
Learning goal-directed behavior in environments with sparse feedback is a major challenge for reinforcement learning algorithms.