Search Results for author: Jonah Brown-Cohen

Found 5 papers, 2 papers with code

Scalable AI Safety via Doubly-Efficient Debate

1 code implementation23 Nov 2023 Jonah Brown-Cohen, Geoffrey Irving, Georgios Piliouras

The emergence of pre-trained AI systems with powerful capabilities across a diverse and ever-increasing set of complex domains has raised a critical challenge for AI safety as tasks can become too complicated for humans to judge directly.

Skill-Mix: a Flexible and Expandable Family of Evaluations for AI models

no code implementations26 Oct 2023 Dingli Yu, Simran Kaur, Arushi Gupta, Jonah Brown-Cohen, Anirudh Goyal, Sanjeev Arora

The paper develops a methodology for (a) designing and administering such an evaluation, and (b) automatic grading (plus spot-checking by humans) of the results using GPT-4 as well as the open LLaMA-2 70B model.

Detecting Adversarial Directions in Deep Reinforcement Learning to Make Robust Decisions

no code implementations9 Jun 2023 Ezgi Korkmaz, Jonah Brown-Cohen

Learning in MDPs with highly complex state representations is currently possible due to multiple advancements in reinforcement learning algorithm design.

Adversarial Attack Atari Games +1

Faster Algorithms and Constant Lower Bounds for the Worst-Case Expected Error

1 code implementation NeurIPS 2021 Jonah Brown-Cohen

Chen, Valiant and Valiant show that, when data values are $\ell_{\infty}$-normalized, there is a polynomial time algorithm to compute an estimator for the mean with worst-case expected error that is within a factor $\frac{\pi}{2}$ of the optimum within the natural class of semilinear estimators.

Detecting Worst-case Corruptions via Loss Landscape Curvature in Deep Reinforcement Learning

no code implementations29 Sep 2021 Ezgi Korkmaz, Jonah Brown-Cohen

The non-robustness of neural network policies to adversarial examples poses a challenge for deep reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Cannot find the paper you are looking for? You can Submit a new open access paper.