Search Results for author: Arjun Panickssery

Found 4 papers, 2 papers with code

Analyzing Probabilistic Methods for Evaluating Agent Capabilities

no code implementations24 Sep 2024 Axel Højmark, Govind Pimpale, Arjun Panickssery, Marius Hobbhahn, Jérémy Scheurer

To enhance the accuracy of capability estimates of AI agents on difficult tasks, we suggest future work should leverage the rich literature on Monte Carlo Estimators.

AI Agent

Future Events as Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs

1 code implementation4 Jul 2024 Sara Price, Arjun Panickssery, Sam Bowman, Asa Cooper Stickland

Backdoors are hidden behaviors that are only triggered once an AI system has been deployed.

LLM Evaluators Recognize and Favor Their Own Generations

no code implementations15 Apr 2024 Arjun Panickssery, Samuel R. Bowman, Shi Feng

Self-evaluation using large language models (LLMs) has proven valuable not only in benchmarking but also methods like reward modeling, constitutional AI, and self-refinement.

Benchmarking

Cannot find the paper you are looking for? You can Submit a new open access paper.