no code implementations • 10 Sep 2024 • Hoang Anh Just, Mahavir Dabas, Lifu Huang, Ming Jin, Ruoxi Jia
This approach allows the model to gain a deeper understanding of the problem's context and identify the most effective solution path during the inference stage.
1 code implementation • 29 Jul 2024 • Feiyang Kang, Yifan Sun, Bingbing Wen, Si Chen, Dawn Song, Rafid Mahmood, Ruoxi Jia
Domain reweighting is an emerging research area aimed at adjusting the relative weights of different data sources to improve the effectiveness and efficiency of language model pre-training.
no code implementations • 28 Jul 2024 • Mengmeng Wu, Zhihong Liu, Xiang Li, Ruoxi Jia, Xiangyu Chang
As data plays an increasingly pivotal role in decision-making, the emergence of data markets underscores the growing importance of data valuation.
1 code implementation • 19 Jul 2024 • Hoang Anh Just, Ming Jin, Anit Sahu, Huy Phan, Ruoxi Jia
Reinforcement learning from human feedback plays a crucial role in aligning language models towards human preferences, traditionally represented through comparisons between pairs or sets of responses within a given context.
no code implementations • 11 Jul 2024 • Yi Zeng, Yu Yang, Andy Zhou, Jeffrey Ziwei Tan, Yuheng Tu, Yifan Mai, Kevin Klyman, Minzhou Pan, Ruoxi Jia, Dawn Song, Percy Liang, Bo Li
However, existing public benchmarks often define safety categories based on previous literature, intuitions, or common sense, leading to disjointed sets of categories for risks specified in recent regulations and policies, which makes it challenging to evaluate and compare FMs across these benchmarks.
no code implementations • 25 Jun 2024 • Jianfeng He, Runing Yang, Linlin Yu, Changbin Li, Ruoxi Jia, Feng Chen, Ming Jin, Chang-Tien Lu
Text summarization, a key natural language generation (NLG) task, is vital in various domains.
no code implementations • 25 Jun 2024 • Yi Zeng, Kevin Klyman, Andy Zhou, Yu Yang, Minzhou Pan, Ruoxi Jia, Dawn Song, Percy Liang, Bo Li
We present a comprehensive AI risk taxonomy derived from eight government policies from the European Union, United States, and China and 16 company policies worldwide, making a significant step towards establishing a unified language for generative AI safety evaluation.
1 code implementation • 24 Jun 2024 • Yi Zeng, Weiyu Sun, Tran Ngoc Huynh, Dawn Song, Bo Li, Ruoxi Jia
Safety backdoor attacks in large language models (LLMs) enable the stealthy triggering of unsafe behaviors while evading detection during normal interactions.
no code implementations • 20 Jun 2024 • Tinghao Xie, Xiangyu Qi, Yi Zeng, Yangsibo Huang, Udari Madhushani Sehwag, Kaixuan Huang, Luxi He, Boyi Wei, Dacheng Li, Ying Sheng, Ruoxi Jia, Bo Li, Kai Li, Danqi Chen, Peter Henderson, Prateek Mittal
First, existing methods often use coarse-grained taxonomies of unsafe topics, and are over-representing some fine-grained topics.
no code implementations • 16 Jun 2024 • Jiachen T. Wang, Prateek Mittal, Dawn Song, Ruoxi Jia
Data Shapley provides a principled framework for attributing data's contribution within machine learning contexts.
no code implementations • 11 Jun 2024 • Yi Zeng, Xuelin Yang, Li Chen, Cristian Canton Ferrer, Ming Jin, Michael I. Jordan, Ruoxi Jia
To address issues of group-level fairness in machine learning, it is natural to adjust model parameters based on specific fairness objectives over a sensitive-attributed validation set.
1 code implementation • 6 Jun 2024 • Minzhou Pan, Yi Zeng, Xue Lin, Ning Yu, Cho-Jui Hsieh, Peter Henderson, Ruoxi Jia
In this study, we investigate the vulnerability of image watermarks to diffusion-model-based image editing, a challenge exacerbated by the computational cost of accessing gradient information and the closed-source nature of many diffusion models.
no code implementations • 29 May 2024 • Xiangyu Qi, Yangsibo Huang, Yi Zeng, Edoardo Debenedetti, Jonas Geiping, Luxi He, Kaixuan Huang, Udari Madhushani, Vikash Sehwag, Weijia Shi, Boyi Wei, Tinghao Xie, Danqi Chen, Pin-Yu Chen, Jeffrey Ding, Ruoxi Jia, Jiaqi Ma, Arvind Narayanan, Weijie J Su, Mengdi Wang, Chaowei Xiao, Bo Li, Dawn Song, Peter Henderson, Prateek Mittal
The exposure of security vulnerabilities in safety-aligned language models, e. g., susceptibility to adversarial attacks, has shed light on the intricate interplay between AI safety and AI security.
no code implementations • 21 May 2024 • Bilgehan Sel, Priya Shanmugasundaram, Mohammad Kachuee, Kun Zhou, Ruoxi Jia, Ming Jin
Large Language Models (LLMs) have shown remarkable capabilities in tasks such as summarization, arithmetic reasoning, and question answering.
no code implementations • 6 May 2024 • Jiachen T. Wang, Tianji Yang, James Zou, Yongchan Kwon, Ruoxi Jia
Data Shapley provides a principled approach to data valuation and plays a crucial role in data-centric machine learning (ML) research.
no code implementations • 5 May 2024 • Feiyang Kang, Hoang Anh Just, Yifan Sun, Himanshu Jahagirdar, Yuanzhi Zhang, Rongxing Du, Anit Kumar Sahu, Ruoxi Jia
The goal is to minimize the need for costly domain-specific data for subsequent fine-tuning while achieving desired performance levels.
no code implementations • 22 Apr 2024 • Si Chen, Feiyang Kang, Ning Yu, Ruoxi Jia
Existing approaches to fact tracing rely on assessing the similarity between each training sample and the query along a certain dimension, such as lexical similarity, gradient, or embedding space.
1 code implementation • 19 Mar 2024 • Zhuowen Yuan, Zidi Xiong, Yi Zeng, Ning Yu, Ruoxi Jia, Dawn Song, Bo Li
The innovative use of constrained optimization and a fusion-based guardrail approach represents a significant step forward in developing more secure and reliable LLMs, setting a new standard for content moderation frameworks in the face of evolving digital threats.
1 code implementation • 15 Mar 2024 • Chenguang Wang, Ruoxi Jia, Xin Liu, Dawn Song
We show that CLIP leads to a significant robustness drop compared to supervised ImageNet models on our benchmark, especially under synthetic distribution shift and adversarial attacks.
no code implementations • 7 Mar 2024 • Shayne Longpre, Sayash Kapoor, Kevin Klyman, Ashwin Ramaswami, Rishi Bommasani, Borhane Blili-Hamelin, Yangsibo Huang, Aviya Skowron, Zheng-Xin Yong, Suhas Kotha, Yi Zeng, Weiyan Shi, Xianjun Yang, Reid Southen, Alexander Robey, Patrick Chao, Diyi Yang, Ruoxi Jia, Daniel Kang, Sandy Pentland, Arvind Narayanan, Percy Liang, Peter Henderson
Independent evaluation and red teaming are critical for identifying the risks posed by generative AI systems.
1 code implementation • CVPR 2024 • Myeongseob Ko, Feiyang Kang, Weiyan Shi, Ming Jin, Zhou Yu, Ruoxi Jia
Inspired by this, we introduce a new method for estimating the influence of training data, which requires calculating gradients for specific test samples, paired with a forward pass for each training point.
no code implementations • 20 Jan 2024 • Jiachen T. Wang, Prateek Mittal, Ruoxi Jia
This work aims to address an open problem in data valuation literature concerning the efficient computation of Data Shapley for weighted $K$ nearest neighbor algorithm (WKNN-Shapley).
1 code implementation • 12 Jan 2024 • Yi Zeng, Hongpeng Lin, Jingwen Zhang, Diyi Yang, Ruoxi Jia, Weiyan Shi
This paper introduces a new perspective to jailbreak LLMs as human-like communicators, to explore this overlooked intersection between everyday language interaction and AI safety.
no code implementations • 22 Nov 2023 • Lingjiao Chen, Bilge Acun, Newsha Ardalani, Yifan Sun, Feiyang Kang, Hanrui Lyu, Yongchan Kwon, Ruoxi Jia, Carole-Jean Wu, Matei Zaharia, James Zou
As Machine Learning (ML) systems continue to grow, the demand for relevant and comprehensive datasets becomes imperative.
no code implementations • 25 Oct 2023 • Zixin Ding, Si Chen, Ruoxi Jia, Yuxin Chen
To address these limitations, we propose a novel approach for active learning, which aims to select batches of unlabeled instances through a learned surrogate model for data acquisition.
1 code implementation • 5 Oct 2023 • Xiangyu Qi, Yi Zeng, Tinghao Xie, Pin-Yu Chen, Ruoxi Jia, Prateek Mittal, Peter Henderson
Optimizing large language models (LLMs) for downstream use cases often involves the customization of pre-trained LLMs through further fine-tuning.
1 code implementation • ICCV 2023 • Myeongseob Ko, Ming Jin, Chenguang Wang, Ruoxi Jia
Furthermore, our enhanced attacks outperform the baseline across multiple models and datasets, with the weakly supervised attack demonstrating an average-case performance improvement of $17\%$ and being at least $7$X more effective at low false-positive rates.
no code implementations • 30 Aug 2023 • Jiachen T. Wang, Yuqing Zhu, Yu-Xiang Wang, Ruoxi Jia, Prateek Mittal
Data valuation aims to quantify the usefulness of individual data sources in training machine learning (ML) models, and is a critical aspect of data-centric ML research.
no code implementations • 20 Aug 2023 • Bilgehan Sel, Ahmad Al-Tawaha, Vanshaj Khattar, Ruoxi Jia, Ming Jin
Current literature, aiming to surpass the "Chain-of-Thought" approach, often resorts to external modi operandi involving halting, modifying, and then resuming the generation process to boost Large Language Models' (LLMs) reasoning capacities.
1 code implementation • 18 Jun 2023 • Zhihong Liu, Hoang Anh Just, Xiangyu Chang, Xi Chen, Ruoxi Jia
Data valuation -- quantifying the contribution of individual data sources to certain predictive behaviors of a model -- is of great importance to enhancing the transparency of machine learning and designing incentive systems for data sharing.
1 code implementation • 4 Jun 2023 • Junyuan Hong, Yi Zeng, Shuyang Yu, Lingjuan Lyu, Ruoxi Jia, Jiayu Zhou
Data-free knowledge distillation (KD) helps transfer knowledge from a pre-trained model (known as the teacher model) to a smaller model (known as the student model) without access to the original training data used for training the teacher model.
Backdoor Defense for Data-Free Distillation with Poisoned Teachers Data-free Knowledge Distillation
1 code implementation • 28 Apr 2023 • Hoang Anh Just, Feiyang Kang, Jiachen T. Wang, Yi Zeng, Myeongseob Ko, Ming Jin, Ruoxi Jia
(1) We develop a proxy for the validation performance associated with a training set based on a non-conventional class-wise Wasserstein distance between training and validation sets.
no code implementations • 17 Apr 2023 • Jiachen T. Wang, Saeed Mahloujifar, Tong Wu, Ruoxi Jia, Prateek Mittal
In this paper, we propose a new differential privacy paradigm called estimate-verify-release (EVR), which addresses the challenges of providing a strict upper bound for privacy parameter in DP compositions by converting an estimate of privacy parameter into a formal guarantee.
1 code implementation • 9 Apr 2023 • Jiachen T. Wang, Ruoxi Jia
In this note, we revisit the work of Jia et al. (2019) and propose a more natural and interpretable utility function that better reflects the performance of KNN models.
no code implementations • 22 Feb 2023 • Jiachen T. Wang, Ruoxi Jia
Our analysis and insights contribute to a better understanding of the challenges in developing efficient SV estimation algorithms for data valuation.
1 code implementation • 22 Feb 2023 • Minzhou Pan, Yi Zeng, Lingjuan Lyu, Xue Lin, Ruoxi Jia
However, we lack a thorough understanding of the applicability of existing detection methods across a variety of learning settings.
no code implementations • 2 Dec 2022 • Ming Jin, Vanshaj Khattar, Harshal Kaushik, Bilgehan Sel, Ruoxi Jia
We study the expressibility and learnability of convex optimization solution functions and their multi-layer architectural extension.
no code implementations • 30 Oct 2022 • Mengmeng Wu, Ruoxi Jia, Changle lin, Wei Huang, Xiangyu Chang
Data valuation, especially quantifying data value in algorithmic prediction and decision-making, is a fundamental problem in data trading scenarios.
1 code implementation • 12 Oct 2022 • Yi Zeng, Minzhou Pan, Himanshu Jahagirdar, Ming Jin, Lingjuan Lyu, Ruoxi Jia
Most poisoning defenses presume access to a set of clean data (or base set).
no code implementations • 16 Sep 2022 • Jiachen T. Wang, Saeed Mahloujifar, Shouda Wang, Ruoxi Jia, Prateek Mittal
As an application of our analysis, we show that PTR and our theoretical results can be used to design differentially private variants for byzantine robust training algorithms that use robust statistics for gradients aggregation.
no code implementations • 14 Jun 2022 • Si Chen, Yi Zeng, Jiachen T. Wang, Won Park, Xun Chen, Lingjuan Lyu, Zhuoqing Mao, Ruoxi Jia
Our work is the first to provide a thorough understanding of leveraging model inversion for effective backdoor removal by addressing key questions about reconstructed samples' properties, perceptual similarity, and the potential presence of backdoor triggers.
2 code implementations • 30 May 2022 • Jiachen T. Wang, Ruoxi Jia
To address this challenge, we introduce the concept of safety margin, which measures the robustness of a data value notion.
1 code implementation • 15 Apr 2022 • Weiyan Shi, Ryan Shea, Si Chen, Chiyuan Zhang, Ruoxi Jia, Zhou Yu
Utilizing the fact that sensitive information in language data tends to be sparse, Shi et al. (2021) formalized a DP notion extension called Selective Differential Privacy (SDP) to protect only the sensitive tokens defined by a policy function.
2 code implementations • 11 Apr 2022 • Yi Zeng, Minzhou Pan, Hoang Anh Just, Lingjuan Lyu, Meikang Qiu, Ruoxi Jia
With poisoning equal to or less than 0. 5% of the target-class data and 0. 05% of the training set, we can train a model to classify test examples from arbitrary classes into the target class when the examples are patched with a backdoor trigger.
Ranked #1 on Clean-label Backdoor Attack (0.05%) on Tiny ImageNet
1 code implementation • CVPR 2022 • Mostafa Kahla, Si Chen, Hoang Anh Just, Ruoxi Jia
In this paper, we introduce an algorithm, Boundary-Repelling Model Inversion (BREP-MI), to invert private training data using only the target model's predicted labels.
1 code implementation • 24 Nov 2021 • Yingyan Zeng, Jiachen T. Wang, Si Chen, Hoang Anh Just, Ran Jin, Ruoxi Jia
In this work, we propose ModelPred, a framework that helps to understand the impact of changes in training data on a trained model.
2 code implementations • ICLR 2022 • Yi Zeng, Si Chen, Won Park, Z. Morley Mao, Ming Jin, Ruoxi Jia
Particularly, its performance is more robust to the variation on triggers, attack settings, poison ratio, and clean data size.
no code implementations • 29 Sep 2021 • Tianhao Wang, Yi Zeng, Ming Jin, Ruoxi Jia
In this paper, we focus on the problem of identifying bad training data when the underlying cause is unknown in advance.
1 code implementation • NAACL 2022 • Weiyan Shi, Aiqi Cui, Evan Li, Ruoxi Jia, Zhou Yu
Given that the private information in natural language is sparse (for example, the bulk of an email might not carry personally identifiable information), we propose a new privacy notion, selective differential privacy, to provide rigorous privacy guarantees on the sensitive portion of the data to improve model utility.
no code implementations • 14 Jul 2021 • Si Chen, Tianhao Wang, Ruoxi Jia
Our algorithm does not rely on any feedback from annotators in the target domain and hence, can be used to perform zero-round active learning or warm-start existing multi-round active learning strategies.
1 code implementation • 13 Jul 2021 • Tianhao Wang, Yu Yang, Ruoxi Jia
The Shapley value (SV) and Least core (LC) are classic methods in cooperative game theory for cost/profit sharing problems.
no code implementations • 10 Jun 2021 • Tianhao Wang, Yi Zeng, Ming Jin, Ruoxi Jia
High-quality data is critical to train performant Machine Learning (ML) models, highlighting the importance of Data Quality Management (DQM).
no code implementations • 23 Apr 2021 • Tianhao Wang, Si Chen, Ruoxi Jia
In this work, we initiate the study of one-round active learning, which aims to select a subset of unlabeled data points that achieve the highest model performance after being labeled with only the information from initially labeled data points.
1 code implementation • ICCV 2021 • Yi Zeng, Won Park, Z. Morley Mao, Ruoxi Jia
Acknowledging previous attacks' weaknesses, we propose a practical way to create smooth backdoor triggers without high-frequency artifacts and study their detectability.
2 code implementations • 2 Mar 2021 • Wenxiao Wang, Tianhao Wang, Lun Wang, Nanqing Luo, Pan Zhou, Dawn Song, Ruoxi Jia
Deep learning techniques have achieved remarkable performance in wide-ranging tasks.
no code implementations • 1 Jan 2021 • Lun Wang, Ruoxi Jia, Dawn Song
We provide complete analysis of the privacy guarantee, communication cost and convergence rate of D2p-fed.
2 code implementations • ICCV 2021 • Si Chen, Mostafa Kahla, Ruoxi Jia, Guo-Jun Qi
We present a novel inversion-specific GAN that can better distill knowledge useful for performing attacks on private models from public data.
2 code implementations • ICLR 2021 • Boxin Wang, Shuohang Wang, Yu Cheng, Zhe Gan, Ruoxi Jia, Bo Li, Jingjing Liu
Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks.
Ranked #3 on Natural Language Inference on ANLI test (using extra training data)
no code implementations • 14 Sep 2020 • Tianhao Wang, Johannes Rausch, Ce Zhang, Ruoxi Jia, Dawn Song
The federated SV preserves the desirable properties of the canonical SV while it can be calculated without incurring extra communication cost and is also able to capture the effect of participation order on data value.
2 code implementations • 11 Sep 2020 • Tianhao Wang, Yuheng Zhang, Ruoxi Jia
This paper studies defense mechanisms against model inversion (MI) attacks -- a type of privacy attacks aimed at inferring information about the training data distribution given the access to a target machine learning model.
no code implementations • 22 Jun 2020 • Lun Wang, Ruoxi Jia, Dawn Song
In this paper, we propose the discrete Gaussian based differentially private federated learning (D2P-Fed), a unified scheme to achieve both differential privacy (DP) and communication efficiency in federated learning (FL).
1 code implementation • CVPR 2021 • Ruoxi Jia, Fan Wu, Xuehui Sun, Jiacen Xu, David Dao, Bhavya Kailkhura, Ce Zhang, Bo Li, Dawn Song
Quantifying the importance of each training point to a learning task is a fundamental problem in machine learning and the estimated importance scores have been leveraged to guide a range of data workflows such as data summarization and domain adaption.
1 code implementation • 17 Nov 2019 • Xinyun Chen, Wenxiao Wang, Chris Bender, Yiming Ding, Ruoxi Jia, Bo Li, Dawn Song
The experimental results demonstrate that our fine-tuning based watermark removal attacks could pose real threats to the copyright of pre-trained models, and thus highlight the importance of further investigating the watermarking problem and proposing more robust watermark embedding schemes against the attacks.
1 code implementation • CVPR 2020 • Yuheng Zhang, Ruoxi Jia, Hengzhi Pei, Wenxiao Wang, Bo Li, Dawn Song
This paper studies model-inversion attacks, in which the access to a model is abused to infer information about the training data.
no code implementations • ICLR 2020 • Min Du, Ruoxi Jia, Dawn Song
In this paper, we demonstrate that applying differential privacy can improve the utility of outlier detection and novelty detection, with an extension to detect poisoning samples in backdoor attacks.
no code implementations • 25 Sep 2019 • Ruoxi Jia, Xuehui Sun, Jiacen Xu, Ce Zhang, Bo Li, Dawn Song
Existing approximation algorithms, although achieving great improvement over the exact algorithm, relies on retraining models for multiple times, thus remaining limited when applied to larger-scale learning tasks and real-world datasets.
3 code implementations • 22 Aug 2019 • Ruoxi Jia, David Dao, Boxin Wang, Frances Ann Hubis, Nezihe Merve Gurel, Bo Li, Ce Zhang, Costas J. Spanos, Dawn Song
The most surprising result is that for unweighted $K$NN classifiers and regressors, the Shapley value of all $N$ data points can be computed, exactly, in $O(N\log N)$ time -- an exponential improvement on computational complexity!
1 code implementation • 27 Feb 2019 • Ruoxi Jia, David Dao, Boxin Wang, Frances Ann Hubis, Nick Hynes, Nezihe Merve Gurel, Bo Li, Ce Zhang, Dawn Song, Costas Spanos
In this paper, we study the problem of data valuation by utilizing the Shapley value, a popular notion of value which originated in cooperative game theory.
no code implementations • 23 Oct 2018 • Jingkang Wang, Ruoxi Jia, Gerald Friedland, Bo Li, Costas Spanos
Despite the great success achieved in machine learning (ML), adversarial examples have caused concerns with regards to its trustworthiness: A small perturbation of an input results in an arbitrary failure of an otherwise seemingly well-trained ML model.
1 code implementation • 10 Jul 2018 • Gerald Friedland, Jingkang Wang, Ruoxi Jia, Bo Li
This paper proposes a fundamental answer to a frequently asked question in multimedia computing and machine learning: Do artifacts from perceptual compression contribute to error in the machine learning process and if so, how much?
no code implementations • 22 Jun 2014 • Ming Jin, Han Zou, Kevin Weekly, Ruoxi Jia, Alexandre M. Bayen, Costas J. Spanos
We present results from a set of experiments in this pilot study to investigate the causal influence of user activity on various environmental parameters monitored by occupant carried multi-purpose sensors.