Search Results for author: Aounon Kumar

Found 14 papers, 9 papers with code

Manipulating Large Language Models to Increase Product Visibility

2 code implementations11 Apr 2024 Aounon Kumar, Himabindu Lakkaraju

We demonstrate that adding a strategic text sequence (STS) -- a carefully crafted message -- to a product's information page can significantly increase its likelihood of being listed as the LLM's top recommendation.

STS

Towards Safe and Aligned Large Language Models for Medicine

no code implementations6 Mar 2024 Tessa Han, Aounon Kumar, Chirag Agarwal, Himabindu Lakkaraju

The capabilities of large language models (LLMs) have been progressing at a breathtaking speed, leaving even their own developers grappling with the depth of their potential and risks.

General Knowledge

Robustness of AI-Image Detectors: Fundamental Limits and Practical Attacks

1 code implementation29 Sep 2023 Mehrdad Saberi, Vinu Sankar Sadasivan, Keivan Rezaei, Aounon Kumar, Atoosa Chegini, Wenxiao Wang, Soheil Feizi

Moreover, we show that watermarking methods are vulnerable to spoofing attacks where the attacker aims to have real images identified as watermarked ones, damaging the reputation of the developers.

Adversarial Attack Face Swapping

Certifying LLM Safety against Adversarial Prompting

1 code implementation6 Sep 2023 Aounon Kumar, Chirag Agarwal, Suraj Srinivas, Aaron Jiaxun Li, Soheil Feizi, Himabindu Lakkaraju

We defend against three attack modes: i) adversarial suffix, where an adversarial sequence is appended at the end of a harmful prompt; ii) adversarial insertion, where the adversarial sequence is inserted anywhere in the middle of the prompt; and iii) adversarial infusion, where adversarial tokens are inserted at arbitrary positions in the prompt, not necessarily as a contiguous block.

Adversarial Attack Language Modelling

Provable Robustness for Streaming Models with a Sliding Window

no code implementations28 Mar 2023 Aounon Kumar, Vinu Sankar Sadasivan, Soheil Feizi

Robustness certificates based on the assumption of independent input samples are not directly applicable in such scenarios.

Human Activity Recognition Image Classification

Can AI-Generated Text be Reliably Detected?

1 code implementation17 Mar 2023 Vinu Sankar Sadasivan, Aounon Kumar, Sriram Balasubramanian, Wenxiao Wang, Soheil Feizi

In particular, we develop a recursive paraphrasing attack to apply on AI text, which can break a whole range of detectors, including the ones using the watermarking schemes as well as neural network-based detectors, zero-shot classifiers, and retrieval-based detectors.

Language Modelling Large Language Model +2

Certifying Model Accuracy under Distribution Shifts

1 code implementation28 Jan 2022 Aounon Kumar, Alexander Levine, Tom Goldstein, Soheil Feizi

Certified robustness in machine learning has primarily focused on adversarial perturbations of the input with a fixed attack budget for each point in the data distribution.

Policy Smoothing for Provably Robust Reinforcement Learning

no code implementations ICLR 2022 Aounon Kumar, Alexander Levine, Soheil Feizi

Prior works in provable robustness in RL seek to certify the behaviour of the victim policy at every time-step against a non-adaptive adversary using methods developed for the static setting.

Adversarial Robustness Image Classification +3

Center Smoothing: Certified Robustness for Networks with Structured Outputs

1 code implementation NeurIPS 2021 Aounon Kumar, Tom Goldstein

We extend the scope of certifiable robustness to problems with more general and structured outputs like sets, images, language, etc.

Adversarial Robustness Dimensionality Reduction +7

Tight Second-Order Certificates for Randomized Smoothing

1 code implementation20 Oct 2020 Alexander Levine, Aounon Kumar, Thomas Goldstein, Soheil Feizi

In this work, we show that there also exists a universal curvature-like bound for Gaussian random smoothing: given the exact value and gradient of a smoothed function, we compute a lower bound on the distance of a point to its closest adversarial example, called the Second-order Smoothing (SoS) robustness certificate.

Certifying Confidence via Randomized Smoothing

no code implementations NeurIPS 2020 Aounon Kumar, Alexander Levine, Soheil Feizi, Tom Goldstein

It uses the probabilities of predicting the top two most-likely classes around an input point under a smoothing distribution to generate a certified radius for a classifier's prediction.

LEMMA

Detection as Regression: Certified Object Detection by Median Smoothing

1 code implementation7 Jul 2020 Ping-Yeh Chiang, Michael J. Curry, Ahmed Abdelkader, Aounon Kumar, John Dickerson, Tom Goldstein

While adversarial training can improve the empirical robustness of image classifiers, a direct extension to object detection is very expensive.

Object object-detection +2

Curse of Dimensionality on Randomized Smoothing for Certifiable Robustness

1 code implementation ICML 2020 Aounon Kumar, Alexander Levine, Tom Goldstein, Soheil Feizi

Notably, for $p \geq 2$, this dependence on $d$ is no better than that of the $\ell_p$-radius that can be certified using isotropic Gaussian smoothing, essentially putting a matching lower bound on the robustness radius.

Cannot find the paper you are looking for? You can Submit a new open access paper.