no code implementations • 6 Feb 2025 • Sihui Dai, Christian Cianfarani, Arjun Bhagoji, Vikash Sehwag, Prateek Mittal
We present theoretical results which show that the gap in a model's robustness against different attacks is bounded by how far each attack perturbs a sample in the model's logit space, suggesting that regularizing with respect to this logit space distance can help maintain robustness against previous attacks.
no code implementations • 22 Oct 2024 • David Schneider, Sina Sajadmanesh, Vikash Sehwag, Saquib Sarfraz, Rainer Stiefelhagen, Lingjuan Lyu, Vivek Sharma
Prevalent methods tackling this problem use differential privacy (DP) or obfuscation techniques to protect the privacy of individuals.
no code implementations • 16 Oct 2024 • Jie Ren, Kangrui Chen, Chen Chen, Vikash Sehwag, Yue Xing, Jiliang Tang, Lingjuan Lyu
Existing methods, such as sample-level Membership Inference Attacks (MIA) and distribution-based dataset inference, distinguish member data (data used for training) and non-member data by leveraging the common observation that models tend to memorize and show greater confidence in member data.
1 code implementation • 22 Jul 2024 • Vikash Sehwag, Xianghao Kong, Jingtao Li, Michael Spranger, Lingjuan Lyu
As scaling laws in generative AI push performance, they also simultaneously concentrate the development of these models among actors with large computational resources.
1 code implementation • 7 Jun 2024 • Zhenting Wang, Chen Chen, Vikash Sehwag, Minzhou Pan, Lingjuan Lyu
To mitigate such IP infringement problems, we also propose a defense method against it.
no code implementations • 29 May 2024 • Xiangyu Qi, Yangsibo Huang, Yi Zeng, Edoardo Debenedetti, Jonas Geiping, Luxi He, Kaixuan Huang, Udari Madhushani, Vikash Sehwag, Weijia Shi, Boyi Wei, Tinghao Xie, Danqi Chen, Pin-Yu Chen, Jeffrey Ding, Ruoxi Jia, Jiaqi Ma, Arvind Narayanan, Weijie J Su, Mengdi Wang, Chaowei Xiao, Bo Li, Dawn Song, Peter Henderson, Prateek Mittal
The exposure of security vulnerabilities in safety-aligned language models, e. g., susceptibility to adversarial attacks, has shed light on the intricate interplay between AI safety and AI security.
1 code implementation • 22 May 2024 • Zhenting Wang, Vikash Sehwag, Chen Chen, Lingjuan Lyu, Dimitris N. Metaxas, Shiqing Ma
To study this problem, we design a latent inversion based method called LatentTracer to trace the generated images of the inspected model by checking if the examined images can be well-reconstructed with an inverted latent input.
3 code implementations • 28 Mar 2024 • Patrick Chao, Edoardo Debenedetti, Alexander Robey, Maksym Andriushchenko, Francesco Croce, Vikash Sehwag, Edgar Dobriban, Nicolas Flammarion, George J. Pappas, Florian Tramer, Hamed Hassani, Eric Wong
To address these challenges, we introduce JailbreakBench, an open-sourced benchmark with the following components: (1) an evolving repository of state-of-the-art adversarial prompts, which we refer to as jailbreak artifacts; (2) a jailbreaking dataset comprising 100 behaviors -- both original and sourced from prior work (Zou et al., 2023; Mazeika et al., 2023, 2024) -- which align with OpenAI's usage policies; (3) a standardized evaluation framework at https://github. com/JailbreakBench/jailbreakbench that includes a clearly defined threat model, system prompts, chat templates, and scoring functions; and (4) a leaderboard at https://jailbreakbench. github. io/ that tracks the performance of attacks and defenses for various LLMs.
no code implementations • 23 Mar 2024 • Minzhou Pan, Zhenting Wang, Xin Dong, Vikash Sehwag, Lingjuan Lyu, Xue Lin
In this paper, we propose WaterMark Detection (WMD), the first invisible watermark detection method under a black-box and annotation-free setting.
1 code implementation • 20 Dec 2023 • Edoardo Debenedetti, Zishen Wan, Maksym Andriushchenko, Vikash Sehwag, Kshitij Bhardwaj, Bhavya Kailkhura
Finally, we make our benchmarking framework (built on top of \texttt{timm}~\citep{rw2019timm}) publicly available to facilitate future analysis in efficient robust deep learning.
no code implementations • 21 Feb 2023 • Sihui Dai, Saeed Mahloujifar, Chong Xiang, Vikash Sehwag, Pin-Yu Chen, Prateek Mittal
Using our framework, we present the first leaderboard, MultiRobustBench, for benchmarking multiattack evaluation which captures performance across attack types and attack strengths.
1 code implementation • 30 Jan 2023 • Nicholas Carlini, Jamie Hayes, Milad Nasr, Matthew Jagielski, Vikash Sehwag, Florian Tramèr, Borja Balle, Daphne Ippolito, Eric Wallace
Image diffusion models such as DALL-E 2, Imagen, and Stable Diffusion have attracted significant attention due to their ability to generate high-quality synthetic images.
no code implementations • 29 Jan 2023 • Tong Wu, Feiran Jia, Xiangyu Qi, Jiachen T. Wang, Vikash Sehwag, Saeed Mahloujifar, Prateek Mittal
Recently, test-time adaptation (TTA) has been proposed as a promising solution for addressing distribution shifts.
no code implementations • 8 Dec 2022 • Ashwinee Panda, Xinyu Tang, Saeed Mahloujifar, Vikash Sehwag, Prateek Mittal
An open problem in differentially private deep learning is hyperparameter optimization (HPO).
1 code implementation • 15 Sep 2022 • Edoardo Debenedetti, Vikash Sehwag, Prateek Mittal
Additionally, investigating the reasons for the robustness of our models, we show that it is easier to generate strong attacks during training when using our recipe and that this leads to better robustness at test time.
no code implementations • 22 Jul 2022 • Tong Wu, Tianhao Wang, Vikash Sehwag, Saeed Mahloujifar, Prateek Mittal
Our attack can be easily deployed in the real world since it only requires rotating the object, as we show in both image classification and object detection applications.
1 code implementation • 20 Jun 2022 • Christian Cianfarani, Arjun Nitin Bhagoji, Vikash Sehwag, Ben Y. Zhao, Prateek Mittal, Haitao Zheng
Representation learning, i. e. the generation of representations useful for downstream applications, is a task of fundamental importance that underlies much of the success of deep neural networks (DNNs).
no code implementations • CVPR 2022 • Vikash Sehwag, Caner Hazirbas, Albert Gordo, Firat Ozgenel, Cristian Canton Ferrer
We observe that uniform sampling from diffusion models predominantly samples from high-density regions of the data manifold.
2 code implementations • ICLR 2022 • Vikash Sehwag, Saeed Mahloujifar, Tinashe Handina, Sihui Dai, Chong Xiang, Mung Chiang, Prateek Mittal
We circumvent this challenge by using additional data from proxy distributions learned by advanced generative models.
1 code implementation • 16 Apr 2021 • Arjun Nitin Bhagoji, Daniel Cullina, Vikash Sehwag, Prateek Mittal
In particular, it is critical to determine classifier-agnostic bounds on the training loss to establish when learning is possible.
3 code implementations • ICLR 2021 • Vikash Sehwag, Mung Chiang, Prateek Mittal
We demonstrate that SSD outperforms most existing detectors based on unlabeled data by a large margin.
1 code implementation • 19 Oct 2020 • Francesco Croce, Maksym Andriushchenko, Vikash Sehwag, Edoardo Debenedetti, Nicolas Flammarion, Mung Chiang, Prateek Mittal, Matthias Hein
As a research community, we are still lacking a systematic understanding of the progress on adversarial robustness which often makes it hard to identify the most promising ideas in training robust models.
no code implementations • 26 Jul 2020 • Hung T. Nguyen, Vikash Sehwag, Seyyedali Hosseinalipour, Christopher G. Brinton, Mung Chiang, H. Vincent Poor
In this paper, we propose a fast-convergent federated learning algorithm, called FOLB, which performs intelligent sampling of devices in each round of model training to optimize the expected convergence speed.
no code implementations • 8 Jul 2020 • Liwei Song, Vikash Sehwag, Arjun Nitin Bhagoji, Prateek Mittal
With our evaluation across 6 OOD detectors, we find that the choice of in-distribution data, model architecture and OOD data have a strong impact on OOD detection performance, inducing false positive rates in excess of $70\%$.
BIG-bench Machine Learning
Out of Distribution (OOD) Detection
no code implementations • 24 Jun 2020 • Vikash Sehwag, Rajvardhan Oak, Mung Chiang, Prateek Mittal
With increasing expressive power, deep neural networks have significantly improved the state-of-the-art on image classification datasets, such as ImageNet.
2 code implementations • 17 May 2020 • Chong Xiang, Arjun Nitin Bhagoji, Vikash Sehwag, Prateek Mittal
In this paper, we propose a general defense framework called PatchGuard that can achieve high provable robustness while maintaining high clean accuracy against localized adversarial patches.
4 code implementations • NeurIPS 2020 • Vikash Sehwag, Shiqi Wang, Prateek Mittal, Suman Jana
We demonstrate that our approach, titled HYDRA, achieves compressed networks with state-of-the-art benign and robust accuracy, simultaneously.
no code implementations • 14 Jun 2019 • Vikash Sehwag, Shiqi Wang, Prateek Mittal, Suman Jana
In this work, we rigorously study the extension of network pruning strategies to preserve both benign accuracy and robustness of a network.
no code implementations • 5 May 2019 • Vikash Sehwag, Arjun Nitin Bhagoji, Liwei Song, Chawin Sitawarin, Daniel Cullina, Mung Chiang, Prateek Mittal
A large body of recent work has investigated the phenomenon of evasion attacks using adversarial examples for deep learning systems, where the addition of norm-bounded perturbations to the test inputs leads to incorrect output classification.