1 code implementation • 28 Nov 2023 • Aleksandar Makelov, Georg Lange, Neel Nanda
We demonstrate this phenomenon in a distilled mathematical example, in two real-world domains (the indirect object identification task and factual recall), and present evidence for its prevalence in practice.
no code implementations • 19 Jul 2023 • Alaa Khaddaj, Guillaume Leclerc, Aleksandar Makelov, Kristian Georgiev, Hadi Salman, Andrew Ilyas, Aleksander Madry
In a backdoor attack, an adversary inserts maliciously constructed backdoor examples into a training set to make the resulting model vulnerable to manipulation.
57 code implementations • ICLR 2018 • Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, Adrian Vladu
Its principled nature also enables us to identify methods for both training and attacking neural networks that are reliable and, in a certain sense, universal.