Search Results for author: Karthik R. Narasimhan

Found 4 papers, 2 papers with code

VideoGameBench: Can Vision-Language Models complete popular video games?

no code implementations23 May 2025 Alex L. Zhang, Thomas L. Griffiths, Karthik R. Narasimhan, Ofir Press

To this end, we introduce VideoGameBench, a benchmark consisting of 10 popular video games from the 1990s that VLMs directly interact with in real-time.

Math

Agent Context Protocols Enhance Collective Inference

no code implementations20 May 2025 Devansh Bhardwaj, Arjun Beniwal, Shreyas Chaudhari, Ashwin Kalyan, Tanmay Rajpurohit, Karthik R. Narasimhan, Ameet Deshpande, Vishvak Murahari

AI agents have become increasingly adept at complex tasks such as coding, reasoning, and multimodal understanding.

SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains?

3 code implementations4 Oct 2024 John Yang, Carlos E. Jimenez, Alex L. Zhang, Kilian Lieret, Joyce Yang, Xindi Wu, Ori Press, Niklas Muennighoff, Gabriel Synnaeve, Karthik R. Narasimhan, Diyi Yang, Sida I. Wang, Ofir Press

Therefore, we propose SWE-bench Multimodal (SWE-bench M), to evaluate systems on their ability to fix bugs in visual, user-facing JavaScript software.

Data Visualization

Cannot find the paper you are looking for? You can Submit a new open access paper.