1 code implementation • 20 Mar 2025 • Vishisht Rao, Aounon Kumar, Himabindu Lakkaraju, Nihar B. Shah
We find a high success rate in the embedding of our watermarks in LLM-generated reviews across models.
no code implementations • 31 Dec 2024 • Martin Pawelczyk, Lillian Sun, Zhenting Qi, Aounon Kumar, Himabindu Lakkaraju
A key phenomenon known as weak-to-strong generalization - where a strong model trained on a weak model's outputs surpasses the weak model in task performance - has gained significant attention.
1 code implementation • 11 Apr 2024 • Aounon Kumar, Himabindu Lakkaraju
We demonstrate that adding a strategic text sequence (STS) -- a carefully crafted message -- to a product's information page can significantly increase its likelihood of being listed as the LLM's top recommendation.
1 code implementation • 6 Mar 2024 • Tessa Han, Aounon Kumar, Chirag Agarwal, Himabindu Lakkaraju
As large language models (LLMs) develop increasingly sophisticated capabilities and find applications in medical settings, it becomes important to assess their medical safety due to their far-reaching implications for personal and public health, patient safety, and human rights.
1 code implementation • 29 Sep 2023 • Mehrdad Saberi, Vinu Sankar Sadasivan, Keivan Rezaei, Aounon Kumar, Atoosa Chegini, Wenxiao Wang, Soheil Feizi
Moreover, we show that watermarking methods are vulnerable to spoofing attacks where the attacker aims to have real images identified as watermarked ones, damaging the reputation of the developers.
1 code implementation • 6 Sep 2023 • Aounon Kumar, Chirag Agarwal, Suraj Srinivas, Aaron Jiaxun Li, Soheil Feizi, Himabindu Lakkaraju
We defend against three attack modes: i) adversarial suffix, where an adversarial sequence is appended at the end of a harmful prompt; ii) adversarial insertion, where the adversarial sequence is inserted anywhere in the middle of the prompt; and iii) adversarial infusion, where adversarial tokens are inserted at arbitrary positions in the prompt, not necessarily as a contiguous block.
no code implementations • 28 Mar 2023 • Aounon Kumar, Vinu Sankar Sadasivan, Soheil Feizi
Robustness certificates based on the assumption of independent input samples are not directly applicable in such scenarios.
1 code implementation • 17 Mar 2023 • Vinu Sankar Sadasivan, Aounon Kumar, Sriram Balasubramanian, Wenxiao Wang, Soheil Feizi
Large Language Models (LLMs) perform impressively well in various applications.
1 code implementation • 28 Jan 2022 • Aounon Kumar, Alexander Levine, Tom Goldstein, Soheil Feizi
Certified robustness in machine learning has primarily focused on adversarial perturbations of the input with a fixed attack budget for each point in the data distribution.
no code implementations • ICLR 2022 • Aounon Kumar, Alexander Levine, Soheil Feizi
Prior works in provable robustness in RL seek to certify the behaviour of the victim policy at every time-step against a non-adaptive adversary using methods developed for the static setting.
1 code implementation • NeurIPS 2021 • Aounon Kumar, Tom Goldstein
We extend the scope of certifiable robustness to problems with more general and structured outputs like sets, images, language, etc.
no code implementations • NeurIPS 2020 • Ping-Yeh Chiang, Michael Curry, Ahmed Abdelkader, Aounon Kumar, John Dickerson, Tom Goldstein
Despite the vulnerability of object detectors to adversarial attacks, very few defenses are known to date.
1 code implementation • 20 Oct 2020 • Alexander Levine, Aounon Kumar, Thomas Goldstein, Soheil Feizi
In this work, we show that there also exists a universal curvature-like bound for Gaussian random smoothing: given the exact value and gradient of a smoothed function, we compute a lower bound on the distance of a point to its closest adversarial example, called the Second-order Smoothing (SoS) robustness certificate.
no code implementations • NeurIPS 2020 • Aounon Kumar, Alexander Levine, Soheil Feizi, Tom Goldstein
It uses the probabilities of predicting the top two most-likely classes around an input point under a smoothing distribution to generate a certified radius for a classifier's prediction.
1 code implementation • 7 Jul 2020 • Ping-Yeh Chiang, Michael J. Curry, Ahmed Abdelkader, Aounon Kumar, John Dickerson, Tom Goldstein
While adversarial training can improve the empirical robustness of image classifiers, a direct extension to object detection is very expensive.
1 code implementation • ICML 2020 • Aounon Kumar, Alexander Levine, Tom Goldstein, Soheil Feizi
Notably, for $p \geq 2$, this dependence on $d$ is no better than that of the $\ell_p$-radius that can be certified using isotropic Gaussian smoothing, essentially putting a matching lower bound on the robustness radius.