no code implementations • 13 Jan 2025 • Blake Bullwinkel, Amanda Minnich, Shiven Chawla, Gary Lopez, Martin Pouliot, Whitney Maxwell, Joris de Gruyter, Katherine Pratt, Saphir Qi, Nina Chikanov, Roman Lutz, Raja Sekhar Rao Dheekonda, Bolor-Erdene Jagdagdorj, Eugenia Kim, Justin Song, Keegan Hines, Daniel Jones, Giorgio Severi, Richard Lundeen, Sam Vaughan, Victoria Westerhoff, Pete Bryan, Ram Shankar Siva Kumar, Yonatan Zunger, Chang Kawaguchi, Mark Russinovich
AI red teaming is not safety benchmarking 4.
1 code implementation • 1 Oct 2024 • Gary D. Lopez Munoz, Amanda J. Minnich, Roman Lutz, Richard Lundeen, Raja Sekhar Rao Dheekonda, Nina Chikanov, Bolor-Erdene Jagdagdorj, Martin Pouliot, Shiven Chawla, Whitney Maxwell, Blake Bullwinkel, Katherine Pratt, Joris de Gruyter, Charlotte Siska, Pete Bryan, Tori Westerhoff, Chang Kawaguchi, Christian Seifert, Ram Shankar Siva Kumar, Yonatan Zunger
To meet this need, we introduce the Python Risk Identification Toolkit (PyRIT), an open-source framework designed to enhance red teaming efforts in GenAI systems.
no code implementations • 18 Jul 2024 • Emman Haider, Daniel Perez-Becker, Thomas Portet, Piyush Madan, Amit Garg, Atabak Ashfaq, David Majercak, Wen Wen, Dongwoo Kim, ZiYi Yang, Jianwen Zhang, Hiteshi Sharma, Blake Bullwinkel, Martin Pouliot, Amanda Minnich, Shiven Chawla, Solianna Herrera, Shahed Warreth, Maggie Engler, Gary Lopez, Nina Chikanov, Raja Sekhar Rao Dheekonda, Bolor-Erdene Jagdagdorj, Roman Lutz, Richard Lundeen, Tori Westerhoff, Pete Bryan, Christian Seifert, Ram Shankar Siva Kumar, Andrew Berkley, Alex Kessler
Recent innovations in language model training have demonstrated that it is possible to create highly performant models that are small enough to run on a smartphone.
no code implementations • 10 Jul 2024 • Alice Qian Zhang, Ryland Shaw, Jacy Reese Anthis, Ashlee Milton, Emily Tseng, Jina Suh, Lama Ahmad, Ram Shankar Siva Kumar, Julian Posada, Benjamin Shestakofsky, Sarah T. Roberts, Mary L. Gray
Rapid progress in general-purpose AI has sparked significant interest in "red teaming," a practice of adversarial testing originating in military and cybersecurity applications.
no code implementations • 23 May 2023 • Micah Musser, Andrew Lohn, James X. Dempsey, Jonathan Spring, Ram Shankar Siva Kumar, Brenda Leong, Christina Liaghati, Cindy Martinez, Crystal D. Grant, Daniel Rohrer, Heather Frase, Jonathan Elliott, John Bansemer, Mikel Rodriguez, Mitt Regan, Rumman Chowdhury, Stefan Hermanek
In July 2022, the Center for Security and Emerging Technology (CSET) at Georgetown University and the Program on Geopolitics, Technology, and Governance at the Stanford Cyber Policy Center convened a workshop of experts to examine the relationship between vulnerabilities in artificial intelligence systems and more traditional types of software vulnerabilities.
no code implementations • ICML Workshop AML 2021 • Kendra Albert, Maggie Delano, Bogdan Kulynych, Ram Shankar Siva Kumar
In this paper, we review the broader impact statements that adversarial ML researchers wrote as part of their NeurIPS 2020 papers and assess the assumptions that authors have about the goals of their work.
no code implementations • 3 Dec 2020 • Kendra Albert, Maggie Delano, Jonathon Penney, Afsaneh Rigot, Ram Shankar Siva Kumar
This paper critically assesses the adequacy and representativeness of physical domain testing for various adversarial machine learning (ML) attacks against computer vision systems involving human subjects.
Computers and Society
no code implementations • 29 Jun 2020 • Ram Shankar Siva Kumar, Jonathon Penney, Bruce Schneier, Kendra Albert
Adversarial Machine Learning is booming with ML researchers increasingly targeting commercial ML systems such as those used in Facebook, Tesla, Microsoft, IBM, Google to demonstrate vulnerabilities.
no code implementations • 4 Feb 2020 • Ram Shankar Siva Kumar, Magnus Nyström, John Lambert, Andrew Marshall, Mario Goertzel, Andi Comissoneru, Matt Swann, Sharon Xia
Based on interviews with 28 organizations, we found that industry practitioners are not equipped with tactical and strategic tools to protect, detect and respond to attacks on their Machine Learning (ML) systems.
no code implementations • 1 Feb 2020 • Kendra Albert, Jonathon Penney, Bruce Schneier, Ram Shankar Siva Kumar
In this paper, we draw on insights from science and technology studies, anthropology, and human rights literature, to inform how defenses against adversarial attacks can be used to suppress dissent and limit attempts to investigate machine learning systems.
2 code implementations • 25 Nov 2019 • Ram Shankar Siva Kumar, David O Brien, Kendra Albert, Salomé Viljöen, Jeffrey Snover
In the last two years, more than 200 papers have been written on how machine learning (ML) systems can fail because of adversarial attacks on the algorithms and data; this number balloons if we were to incorporate papers covering non-adversarial failure modes.
no code implementations • 25 Oct 2018 • Ram Shankar Siva Kumar, David R. O'Brien, Kendra Albert, Salome Vilojen
When machine learning systems fail because of adversarial manipulation, how should society expect the law to respond?
no code implementations • 17 Nov 2017 • Nathan Wiebe, Ram Shankar Siva Kumar
Finally, we provide a private form of $k$--means clustering that can be used to prevent an all powerful adversary from learning more than a small fraction of a bit from any user.
no code implementations • 20 Sep 2017 • Ram Shankar Siva Kumar, Andrew Wicker, Matt Swann
Operationalizing machine learning based security detections is extremely challenging, especially in a continuously evolving cloud environment.