1 code implementation • 16 Feb 2024 • Alexis Chevalier, Jiayi Geng, Alexander Wettig, Howard Chen, Sebastian Mizera, Toni Annala, Max Jameson Aragon, Arturo Rodríguez Fanlo, Simon Frieder, Simon Machado, Akshara Prabhakar, Ellie Thieu, Jiachen T. Wang, ZiRui Wang, Xindi Wu, Mengzhou Xia, Wenhan Jia, Jiatong Yu, Jun-Jie Zhu, Zhiyong Jason Ren, Sanjeev Arora, Danqi Chen
We use TutorChat to fine-tune Llemma models with 7B and 34B parameters.
no code implementations • 20 Jan 2024 • Jiachen T. Wang, Prateek Mittal, Ruoxi Jia
This work aims to address an open problem in data valuation literature concerning the efficient computation of Data Shapley for weighted $K$ nearest neighbor algorithm (WKNN-Shapley).
1 code implementation • 27 Nov 2023 • Junyuan Hong, Jiachen T. Wang, Chenhui Zhang, Zhangheng Li, Bo Li, Zhangyang Wang
To ensure that the prompts do not leak private information, we introduce the first private prompt generation mechanism, by a differentially-private (DP) ensemble of in-context learning with private demonstrations.
no code implementations • 30 Aug 2023 • Jiachen T. Wang, Yuqing Zhu, Yu-Xiang Wang, Ruoxi Jia, Prateek Mittal
Data valuation aims to quantify the usefulness of individual data sources in training machine learning (ML) models, and is a critical aspect of data-centric ML research.
no code implementations • 23 Aug 2023 • Tinghao Xie, Xiangyu Qi, Ping He, Yiming Li, Jiachen T. Wang, Prateek Mittal
We present a novel defense, against backdoor attacks on Deep Neural Networks (DNNs), wherein adversaries covertly implant malicious behaviors (backdoors) into DNNs.
no code implementations • 2 May 2023 • Tong Wu, Ashwinee Panda, Jiachen T. Wang, Prateek Mittal
Based on the general paradigm of DP-ICL, we instantiate several techniques showing how to privatize ICL for text classification and language generation.
1 code implementation • 28 Apr 2023 • Hoang Anh Just, Feiyang Kang, Jiachen T. Wang, Yi Zeng, Myeongseob Ko, Ming Jin, Ruoxi Jia
(1) We develop a proxy for the validation performance associated with a training set based on a non-conventional class-wise Wasserstein distance between training and validation sets.
no code implementations • 17 Apr 2023 • Jiachen T. Wang, Saeed Mahloujifar, Tong Wu, Ruoxi Jia, Prateek Mittal
In this paper, we propose a new differential privacy paradigm called estimate-verify-release (EVR), which addresses the challenges of providing a strict upper bound for privacy parameter in DP compositions by converting an estimate of privacy parameter into a formal guarantee.
1 code implementation • 9 Apr 2023 • Jiachen T. Wang, Ruoxi Jia
In this note, we revisit the work of Jia et al. (2019) and propose a more natural and interpretable utility function that better reflects the performance of KNN models.
no code implementations • 22 Feb 2023 • Jiachen T. Wang, Ruoxi Jia
Our analysis and insights contribute to a better understanding of the challenges in developing efficient SV estimation algorithms for data valuation.
no code implementations • 29 Jan 2023 • Tong Wu, Feiran Jia, Xiangyu Qi, Jiachen T. Wang, Vikash Sehwag, Saeed Mahloujifar, Prateek Mittal
Recently, test-time adaptation (TTA) has been proposed as a promising solution for addressing distribution shifts.
no code implementations • 16 Sep 2022 • Jiachen T. Wang, Saeed Mahloujifar, Shouda Wang, Ruoxi Jia, Prateek Mittal
As an application of our analysis, we show that PTR and our theoretical results can be used to design differentially private variants for byzantine robust training algorithms that use robust statistics for gradients aggregation.
no code implementations • 14 Jun 2022 • Si Chen, Yi Zeng, Jiachen T. Wang, Won Park, Xun Chen, Lingjuan Lyu, Zhuoqing Mao, Ruoxi Jia
Our work is the first to provide a thorough understanding of leveraging model inversion for effective backdoor removal by addressing key questions about reconstructed samples' properties, perceptual similarity, and the potential presence of backdoor triggers.
2 code implementations • 30 May 2022 • Jiachen T. Wang, Ruoxi Jia
To address this challenge, we introduce the concept of safety margin, which measures the robustness of a data value notion.
2 code implementations • 26 May 2022 • Xiangyu Qi, Tinghao Xie, Jiachen T. Wang, Tong Wu, Saeed Mahloujifar, Prateek Mittal
First, we uncover a post-hoc workflow underlying most prior work, where defenders passively allow the attack to proceed and then leverage the characteristics of the post-attacked model to uncover poison samples.
1 code implementation • 24 Nov 2021 • Yingyan Zeng, Jiachen T. Wang, Si Chen, Hoang Anh Just, Ran Jin, Ruoxi Jia
In this work, we propose ModelPred, a framework that helps to understand the impact of changes in training data on a trained model.