no code implementations • ICML 2020 • Logan Engstrom, Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Jacob Steinhardt, Aleksander Madry
Dataset replication is a useful tool for assessing whether models have overfit to a specific validation set or the exact circumstances under which it was generated.
1 code implementation • 5 Feb 2025 • Joshua Vendrow, Edward Vendrow, Sara Beery, Aleksander Madry
We evaluate a wide range of models on these platinum benchmarks and find that, indeed, frontier LLMs still exhibit failures on simple tasks such as elementary-level math word problems.
no code implementations • 21 Dec 2024 • OpenAI, :, Aaron Jaech, Adam Kalai, Adam Lerer, Adam Richardson, Ahmed El-Kishky, Aiden Low, Alec Helyar, Aleksander Madry, Alex Beutel, Alex Carney, Alex Iftimie, Alex Karpenko, Alex Tachard Passos, Alexander Neitz, Alexander Prokofiev, Alexander Wei, Allison Tam, Ally Bennett, Ananya Kumar, Andre Saraiva, Andrea Vallone, Andrew Duberstein, Andrew Kondrich, Andrey Mishchenko, Andy Applebaum, Angela Jiang, Ashvin Nair, Barret Zoph, Behrooz Ghorbani, Ben Rossen, Benjamin Sokolowsky, Boaz Barak, Bob McGrew, Borys Minaiev, Botao Hao, Bowen Baker, Brandon Houghton, Brandon McKinzie, Brydon Eastman, Camillo Lugaresi, Cary Bassin, Cary Hudson, Chak Ming Li, Charles de Bourcy, Chelsea Voss, Chen Shen, Chong Zhang, Chris Koch, Chris Orsinger, Christopher Hesse, Claudia Fischer, Clive Chan, Dan Roberts, Daniel Kappler, Daniel Levy, Daniel Selsam, David Dohan, David Farhi, David Mely, David Robinson, Dimitris Tsipras, Doug Li, Dragos Oprica, Eben Freeman, Eddie Zhang, Edmund Wong, Elizabeth Proehl, Enoch Cheung, Eric Mitchell, Eric Wallace, Erik Ritter, Evan Mays, Fan Wang, Felipe Petroski Such, Filippo Raso, Florencia Leoni, Foivos Tsimpourlas, Francis Song, Fred von Lohmann, Freddie Sulit, Geoff Salmon, Giambattista Parascandolo, Gildas Chabot, Grace Zhao, Greg Brockman, Guillaume Leclerc, Hadi Salman, Haiming Bao, Hao Sheng, Hart Andrin, Hessam Bagherinezhad, Hongyu Ren, Hunter Lightman, Hyung Won Chung, Ian Kivlichan, Ian O'Connell, Ian Osband, Ignasi Clavera Gilaberte, Ilge Akkaya, Ilya Kostrikov, Ilya Sutskever, Irina Kofman, Jakub Pachocki, James Lennon, Jason Wei, Jean Harb, Jerry Twore, Jiacheng Feng, Jiahui Yu, Jiayi Weng, Jie Tang, Jieqi Yu, Joaquin Quiñonero Candela, Joe Palermo, Joel Parish, Johannes Heidecke, John Hallman, John Rizzo, Jonathan Gordon, Jonathan Uesato, Jonathan Ward, Joost Huizinga, Julie Wang, Kai Chen, Kai Xiao, Karan Singhal, Karina Nguyen, Karl Cobbe, Katy Shi, Kayla Wood, Kendra Rimbach, Keren Gu-Lemberg, Keren GuLemberg, Kevin Liu, Kevin Lu, Kevin Stone, Kevin Yu, Lama Ahmad, Lauren Yang, Leo Liu, Leon Maksin, Leyton Ho, Liam Fedus, Lilian Weng, Linden Li, Lindsay McCallum, Lindsey Held, Lorenz Kuhn, Lukas Kondraciuk, Lukasz Kaiser, Luke Metz, Madelaine Boyd, Maja Trebacz, Manas Joglekar, Mark Chen, Marko Tintor, Mason Meyer, Matt Jones, Matt Kaufer, Max Schwarzer, Meghan Shah, Mehmet Yatbaz, Melody Guan, Mengyuan Xu, Mengyuan Yan, Mia Glaese, Mianna Chen, Michael Lampe, Michael Malek, Michele Wang, Michelle Fradin, Mike McClay, Mikhail Pavlov, Miles Wang, Mingxuan Wang, Mira Murati, Mo Bavarian, Mostafa Rohaninejad, Nat McAleese, Neil Chowdhury, Nick Ryder, Nikolas Tezak, Noam Brown, Ofir Nachum, Oleg Boiko, Oleg Murk, Olivia Watkins, Patrick Chao, Paul Ashbourne, Pavel Izmailov, Peter Zhokhov, Rachel Dias, Rahul Arora, Randall Lin, Rapha Gontijo Lopes, Raz Gaon, Reah Miyara, Reimar Leike, Renny Hwang, Rhythm Garg, Robin Brown, Roshan James, Rui Shu, Ryan Cheu, Ryan Greene, Saachi Jain, Sam Altman, Sam Toizer, Sam Toyer, Samuel Miserendino, Sandhini Agarwal, Santiago Hernandez, Sasha Baker, Scott McKinney, Scottie Yan, Shengjia Zhao, Shengli Hu, Shibani Santurkar, Shraman Ray Chaudhuri, Shuyuan Zhang, Siyuan Fu, Spencer Papay, Steph Lin, Suchir Balaji, Suvansh Sanjeev, Szymon Sidor, Tal Broda, Aidan Clark, Tao Wang, Taylor Gordon, Ted Sanders, Tejal Patwardhan, Thibault Sottiaux, Thomas Degry, Thomas Dimson, Tianhao Zheng, Timur Garipov, Tom Stasi, Trapit Bansal, Trevor Creech, Troy Peterson, Tyna Eloundou, Valerie Qi, Vineet Kosaraju, Vinnie Monaco, Vitchyr Pong, Vlad Fomenko, Weiyi Zheng, Wenda Zhou, Wes McCabe, Wojciech Zaremba, Yann Dubois, Yinghai Lu, Yining Chen, Young Cha, Yu Bai, Yuchen He, Yuchen Zhang, Yunyun Wang, Zheng Shao, Zhuohan Li
The o1 model series is trained with large-scale reinforcement learning to reason using chain of thought.
no code implementations • 30 Oct 2024 • Kristian Georgiev, Roy Rinberg, Sung Min Park, Shivam Garg, Andrew Ilyas, Aleksander Madry, Seth Neel
This perspective naturally suggests a reduction from the unlearning problem to that of data attribution, where the goal is to predict the effect of changing the training set on a model's outputs.
1 code implementation • 1 Sep 2024 • Benjamin Cohen-Wang, Harshay Shah, Kristian Georgiev, Aleksander Madry
How do language models use information provided as context when generating a response?
no code implementations • 24 Jun 2024 • Saachi Jain, Kimia Hamidieh, Kristian Georgiev, Andrew Ilyas, Marzyeh Ghassemi, Aleksander Madry
Machine learning models can fail on subgroups that are underrepresented during training.
no code implementations • 9 May 2024 • Sarah H. Cen, Andrew Ilyas, Jennifer Allen, Hannah Li, Aleksander Madry
Although this assumption is convenient, it fails to capture user strategization: that users may attempt to shape their future recommendations by adapting their behavior to the recommendation algorithm.
1 code implementation • 17 Apr 2024 • Harshay Shah, Andrew Ilyas, Aleksander Madry
The goal of component modeling is to decompose an ML model's prediction in terms of its components -- simple functions (e. g., convolution filters, attention heads) that are the "building blocks" of model computation.
1 code implementation • 29 Feb 2024 • Benjamin Cohen-Wang, Joshua Vendrow, Aleksander Madry
In particular, we focus on two possible failure modes of models under distribution shift: poor extrapolation (e. g., they cannot generalize to a different domain) and biases in the training data (e. g., they rely on spurious features).
1 code implementation • 23 Jan 2024 • Logan Engstrom, Axel Feldmann, Aleksander Madry
When selecting data for training large-scale models, standard practice is to filter for examples that match human notions of data quality.
no code implementations • 29 Dec 2023 • Sarah H. Cen, Andrew Ilyas, Aleksander Madry
The developers of these algorithms commonly adopt the assumption that the data generating process is exogenous: that is, how a user reacts to a given prompt (e. g., a recommendation or hiring suggestion) depends on the prompt and not on the algorithm that generated it.
1 code implementation • 11 Dec 2023 • Kristian Georgiev, Joshua Vendrow, Hadi Salman, Sung Min Park, Aleksander Madry
Then, we provide a method for computing these attributions efficiently.
no code implementations • 19 Jul 2023 • Alaa Khaddaj, Guillaume Leclerc, Aleksandar Makelov, Kristian Georgiev, Hadi Salman, Andrew Ilyas, Aleksander Madry
In a backdoor attack, an adversary inserts maliciously constructed backdoor examples into a training set to make the resulting model vulnerable to manipulation.
2 code implementations • CVPR 2023 • Guillaume Leclerc, Andrew Ilyas, Logan Engstrom, Sung Min Park, Hadi Salman, Aleksander Madry
For example, we are able to train an ImageNet ResNet-50 model to 75\% in only 20 mins on a single machine.
no code implementations • 20 Apr 2023 • Sarah H. Cen, Aleksander Madry, Devavrat Shah
In particular, we introduce the notion of a baseline feed: the content that a user would see without filtering (e. g., on Twitter, this could be the chronological timeline).
2 code implementations • 24 Mar 2023 • Sung Min Park, Kristian Georgiev, Andrew Ilyas, Guillaume Leclerc, Aleksander Madry
That is, computationally tractable methods can struggle with accurately attributing model predictions in non-convex settings (e. g., in the context of deep neural networks), while methods that are effective in such regimes require training thousands of models, which makes them impractical for large models or datasets.
1 code implementation • 15 Feb 2023 • Joshua Vendrow, Saachi Jain, Logan Engstrom, Aleksander Madry
In this work, we introduce the notion of a dataset interface: a framework that, given an input dataset and a user-specified shift, returns instances from that input distribution that exhibit the desired shift.
1 code implementation • 13 Feb 2023 • Hadi Salman, Alaa Khaddaj, Guillaume Leclerc, Andrew Ilyas, Aleksander Madry
We present an approach to mitigating the risks of malicious image editing posed by large diffusion models.
1 code implementation • 22 Nov 2022 • Harshay Shah, Sung Min Park, Andrew Ilyas, Aleksander Madry
We study the problem of (learning) algorithm comparison, where the goal is to find differences between models trained with two different learning algorithms.
1 code implementation • CVPR 2023 • Saachi Jain, Hadi Salman, Alaa Khaddaj, Eric Wong, Sung Min Park, Aleksander Madry
It is commonly believed that in transfer learning including more pre-training data translates into better performance.
1 code implementation • 6 Jul 2022 • Hadi Salman, Saachi Jain, Andrew Ilyas, Logan Engstrom, Eric Wong, Aleksander Madry
Using transfer learning to adapt a pre-trained "source model" to a downstream "target task" can dramatically increase performance with seemingly no downside.
1 code implementation • 29 Jun 2022 • Saachi Jain, Hannah Lawrence, Ankur Moitra, Aleksander Madry
Moreover, by combining our framework with off-the-shelf diffusion models, we can generate images that are especially challenging for the analyzed model, and thus can be used to perform synthetic data augmentation that helps remedy the model's failure modes.
no code implementations • 19 Jun 2022 • Chong Guo, Michael J. Lee, Guillaume Leclerc, Joel Dapello, Yug Rao, Aleksander Madry, James J. DiCarlo
Visual systems of primates are the gold standard of robust perception.
1 code implementation • ICLR 2022 • Saachi Jain, Hadi Salman, Eric Wong, Pengchuan Zhang, Vibhav Vineet, Sai Vemprala, Aleksander Madry
Missingness, or the absence of features from an input, is a concept fundamental to many model debugging tools.
1 code implementation • 1 Feb 2022 • Andrew Ilyas, Sung Min Park, Logan Engstrom, Guillaume Leclerc, Aleksander Madry
We present a conceptual framework, datamodeling, for analyzing the behavior of a model class in terms of the training data.
no code implementations • 31 Dec 2021 • Sung Min Park, Kuo-An Wei, Kai Xiao, Jerry Li, Aleksander Madry
We identify properties of universal adversarial perturbations (UAPs) that distinguish them from standard adversarial perturbations.
1 code implementation • NeurIPS 2021 • Shibani Santurkar, Dimitris Tsipras, Mahalaxmi Elango, David Bau, Antonio Torralba, Aleksander Madry
We present a methodology for modifying the behavior of a classifier by directly rewriting its prediction rules.
1 code implementation • 15 Oct 2021 • Saachi Jain, Dimitris Tsipras, Aleksander Madry
To improve model generalization, model designers often restrict the features that their models use, either implicitly or explicitly.
1 code implementation • 7 Jun 2021 • Guillaume Leclerc, Hadi Salman, Andrew Ilyas, Sai Vemprala, Logan Engstrom, Vibhav Vineet, Kai Xiao, Pengchuan Zhang, Shibani Santurkar, Greg Yang, Ashish Kapoor, Aleksander Madry
We introduce 3DB: an extendable, unified framework for testing and debugging vision models using photorealistic simulation.
no code implementations • 1 Jan 2021 • Sung Min Park, Kuo-An Wei, Kai Yuanqing Xiao, Jerry Li, Aleksander Madry
We study universal adversarial perturbations and demonstrate that the above picture is more nuanced.
2 code implementations • NeurIPS 2021 • Hadi Salman, Andrew Ilyas, Logan Engstrom, Sai Vemprala, Aleksander Madry, Ashish Kapoor
We study a class of realistic computer vision settings wherein one can influence the design of the objects being recognized.
no code implementations • 18 Dec 2020 • Micah Goldblum, Dimitris Tsipras, Chulin Xie, Xinyun Chen, Avi Schwarzschild, Dawn Song, Aleksander Madry, Bo Li, Tom Goldstein
As machine learning systems grow in scale, so do their training data requirements, forcing practitioners to automate and outsource the curation of training data in order to achieve state-of-the-art performance.
2 code implementations • ICLR 2021 • Shibani Santurkar, Dimitris Tsipras, Aleksander Madry
We develop a methodology for assessing the robustness of models to subpopulation shift---specifically, their ability to generalize to novel data subpopulations that were not observed during training.
2 code implementations • NeurIPS 2020 • Hadi Salman, Andrew Ilyas, Logan Engstrom, Ashish Kapoor, Aleksander Madry
Typically, better pre-trained models yield better transfer results, suggesting that initial accuracy is a key aspect of transfer learning performance.
Ranked #8 on
Object Recognition
on shape bias
1 code implementation • ICLR 2021 • Kai Xiao, Logan Engstrom, Andrew Ilyas, Aleksander Madry
We assess the tendency of state-of-the-art object recognition models to depend on signals from image backgrounds.
3 code implementations • 25 May 2020 • Logan Engstrom, Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Firdaus Janoos, Larry Rudolph, Aleksander Madry
We study the roots of algorithmic progress in deep policy gradient algorithms through a case study on two popular algorithms: Proximal Policy Optimization (PPO) and Trust Region Policy Optimization (TRPO).
1 code implementation • ICML 2020 • Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Andrew Ilyas, Aleksander Madry
Building rich machine learning datasets in a scalable manner often necessitates a crowd-sourced data collection pipeline.
1 code implementation • 19 May 2020 • Logan Engstrom, Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Jacob Steinhardt, Aleksander Madry
We study ImageNet-v2, a replication of the ImageNet dataset on which models exhibit a significant (11-14%) drop in accuracy, even after controlling for a standard human-in-the-loop measure of data quality.
2 code implementations • ICLR 2020 • Logan Engstrom, Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Firdaus Janoos, Larry Rudolph, Aleksander Madry
We study the roots of algorithmic progress in deep policy gradient algorithms through a case study on two popular algorithms, Proximal Policy Optimization and Trust Region Policy Optimization.
no code implementations • 24 Feb 2020 • Guillaume Leclerc, Aleksander Madry
Learning rate schedule has a major impact on the performance of deep learning models.
4 code implementations • NeurIPS 2020 • Florian Tramer, Nicholas Carlini, Wieland Brendel, Aleksander Madry
Adaptive attacks have (rightfully) become the de facto standard for evaluating defenses to adversarial examples.
2 code implementations • 5 Dec 2019 • Alexander Turner, Dimitris Tsipras, Aleksander Madry
While such attacks are very effective, they crucially rely on the adversary injecting arbitrary inputs that are---often blatantly---mislabeled.
1 code implementation • NeurIPS 2019 • Shibani Santurkar, Dimitris Tsipras, Brandon Tran, Andrew Ilyas, Logan Engstrom, Aleksander Madry
We show that the basic classification framework alone can be used to tackle some of the most challenging tasks in image synthesis.
5 code implementations • 3 Jun 2019 • Logan Engstrom, Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Brandon Tran, Aleksander Madry
In this work, we show that robust optimization can be re-cast as a tool for enforcing priors on the features learned by deep neural networks.
4 code implementations • NeurIPS 2019 • Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Logan Engstrom, Brandon Tran, Aleksander Madry
Adversarial examples have attracted significant attention in machine learning, but the reasons for their existence and pervasiveness remain unclear.
no code implementations • ICLR 2019 • Alexander Turner, Dimitris Tsipras, Aleksander Madry
Deep neural networks have been recently demonstrated to be vulnerable to backdoor attacks.
4 code implementations • 18 Feb 2019 • Nicholas Carlini, Anish Athalye, Nicolas Papernot, Wieland Brendel, Jonas Rauber, Dimitris Tsipras, Ian Goodfellow, Aleksander Madry, Alexey Kurakin
Correctly evaluating defenses against adversarial examples has proven to be extremely difficult.
no code implementations • ICLR 2020 • Andrew Ilyas, Logan Engstrom, Shibani Santurkar, Dimitris Tsipras, Firdaus Janoos, Larry Rudolph, Aleksander Madry
We study how the behavior of deep policy gradient algorithms reflects the conceptual framework motivating their development.
1 code implementation • NeurIPS 2018 • Brandon Tran, Jerry Li, Aleksander Madry
In this paper, we identify a new property of all known backdoor attacks, which we call \emph{spectral signatures}.
1 code implementation • ICLR 2019 • Kai Y. Xiao, Vincent Tjeng, Nur Muhammad Shafiullah, Aleksander Madry
We explore the concept of co-design in the context of neural network verification.
3 code implementations • ICLR 2019 • Andrew Ilyas, Logan Engstrom, Aleksander Madry
We study the problem of generating adversarial examples in a black-box setting in which only loss-oracle access to a model is available.
8 code implementations • ICLR 2019 • Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Alexander Turner, Aleksander Madry
We show that there may exist an inherent tension between the goal of adversarial robustness and that of standard generalization.
11 code implementations • NeurIPS 2018 • Shibani Santurkar, Dimitris Tsipras, Andrew Ilyas, Aleksander Madry
Batch Normalization (BatchNorm) is a widely adopted technique that enables faster and more stable training of deep neural networks (DNNs).
no code implementations • ICLR 2018 • Jerry Li, Aleksander Madry, John Peebles, Ludwig Schmidt
This suggests that such usage of the first order approximation of the discriminator, which is a de-facto standard in all the existing GAN dynamics, might be one of the factors that makes GAN training so challenging in practice.
no code implementations • ICLR 2018 • Shibani Santurkar, Ludwig Schmidt, Aleksander Madry
A fundamental, and still largely unanswered, question in the context of Generative Adversarial Networks (GANs) is whether GANs are actually able to capture the key characteristics of the datasets they are trained on.
2 code implementations • 7 Dec 2017 • Logan Engstrom, Brandon Tran, Dimitris Tsipras, Ludwig Schmidt, Aleksander Madry
The study of adversarial robustness has so far largely focused on perturbations bound in p-norms.
no code implementations • ICML 2018 • Jerry Li, Aleksander Madry, John Peebles, Ludwig Schmidt
While Generative Adversarial Networks (GANs) have demonstrated promising performance on multiple vision tasks, their learning dynamics are not yet well understood, both in theory and in practice.
59 code implementations • ICLR 2018 • Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, Adrian Vladu
Its principled nature also enables us to identify methods for both training and attacking neural networks that are reliable and, in a certain sense, universal.