Search Results for author: Tom Tseng

Found 4 papers, 1 papers with code

Effects of Scale on Language Model Robustness

no code implementations25 Jul 2024 Nikolaus Howe, Ian McKenzie, Oskar Hollinsworth, Michał Zajac, Tom Tseng, Aaron Tucker, Pierre-Luc Bacon, Adam Gleave

Even with the advantage conferred by scale, undefended models remain easy to attack in absolute terms, and we thus turn our attention to explicitly training models for adversarial robustness, which we show to be a much more compute-efficient defense than scaling model size alone.

Adversarial Robustness Language Modeling +1

Can Go AIs be adversarially robust?

no code implementations18 Jun 2024 Tom Tseng, Euan McLean, Kellin Pelrine, Tony T. Wang, Adam Gleave

Prior work found that superhuman Go AIs can be defeated by simple adversarial strategies, especially "cyclic" attacks.

Diversity

Adversarial Policies Beat Superhuman Go AIs

2 code implementations1 Nov 2022 Tony T. Wang, Adam Gleave, Tom Tseng, Kellin Pelrine, Nora Belrose, Joseph Miller, Michael D. Dennis, Yawen Duan, Viktor Pogrebniak, Sergey Levine, Stuart Russell

The core vulnerability uncovered by our attack persists even in KataGo agents adversarially trained to defend against our attack.

Cannot find the paper you are looking for? You can Submit a new open access paper.