Search Results for author: Max Kaufmann

Found 5 papers, 3 papers with code

Visibility into AI Agents

no code implementations23 Jan 2024 Alan Chan, Carson Ezell, Max Kaufmann, Kevin Wei, Lewis Hammond, Herbie Bradley, Emma Bluemke, Nitarshan Rajkumar, David Krueger, Noam Kolt, Lennart Heim, Markus Anderljung

Increased delegation of commercial, scientific, governmental, and personal activities to AI agents -- systems capable of pursuing complex goals with limited supervision -- may exacerbate existing societal risks and introduce new risks.

Informativeness

The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"

2 code implementations21 Sep 2023 Lukas Berglund, Meg Tong, Max Kaufmann, Mikita Balesni, Asa Cooper Stickland, Tomasz Korbak, Owain Evans

If a model is trained on a sentence of the form "A is B", it will not automatically generalize to the reverse direction "B is A".

Data Augmentation Sentence

Testing Robustness Against Unforeseen Adversaries

3 code implementations21 Aug 2019 Max Kaufmann, Daniel Kang, Yi Sun, Steven Basart, Xuwang Yin, Mantas Mazeika, Akul Arora, Adam Dziedzic, Franziska Boenisch, Tom Brown, Jacob Steinhardt, Dan Hendrycks

To narrow in on this discrepancy between research and reality we introduce ImageNet-UA, a framework for evaluating model robustness against a range of unforeseen adversaries, including eighteen new non-L_p attacks.

Adversarial Defense Adversarial Robustness

Cannot find the paper you are looking for? You can Submit a new open access paper.