Search Results for author: Jonah Brown-Cohen

Found 5 papers, 2 papers with code

Scalable AI Safety via Doubly-Efficient Debate

1 code implementation • 23 Nov 2023 • Jonah Brown-Cohen, Geoffrey Irving, Georgios Piliouras

The emergence of pre-trained AI systems with powerful capabilities across a diverse and ever-increasing set of complex domains has raised a critical challenge for AI safety as tasks can become too complicated for humans to judge directly.

Paper
Code

Skill-Mix: a Flexible and Expandable Family of Evaluations for AI models

no code implementations • 26 Oct 2023 • Dingli Yu, Simran Kaur, Arushi Gupta, Jonah Brown-Cohen, Anirudh Goyal, Sanjeev Arora

The paper develops a methodology for (a) designing and administering such an evaluation, and (b) automatic grading (plus spot-checking by humans) of the results using GPT-4 as well as the open LLaMA-2 70B model.

Paper
Add Code

Detecting Adversarial Directions in Deep Reinforcement Learning to Make Robust Decisions

no code implementations • 9 Jun 2023 • Ezgi Korkmaz, Jonah Brown-Cohen

Learning in MDPs with highly complex state representations is currently possible due to multiple advancements in reinforcement learning algorithm design.

Adversarial Attack Atari Games +1

Paper
Add Code

Faster Algorithms and Constant Lower Bounds for the Worst-Case Expected Error

1 code implementation • NeurIPS 2021 • Jonah Brown-Cohen

Chen, Valiant and Valiant show that, when data values are $\ell_{\infty}$-normalized, there is a polynomial time algorithm to compute an estimator for the mean with worst-case expected error that is within a factor $\frac{\pi}{2}$ of the optimum within the natural class of semilinear estimators.

Paper
Code

Detecting Worst-case Corruptions via Loss Landscape Curvature in Deep Reinforcement Learning

no code implementations • 29 Sep 2021 • Ezgi Korkmaz, Jonah Brown-Cohen

The non-robustness of neural network policies to adversarial examples poses a challenge for deep reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.