1 code implementation • 4 Dec 2024 • Xiaojun Xu, Jinghan Jia, Yuanshun Yao, Yang Liu, Hang Li
To embed our multi-bit watermark, we use two paraphrasers alternatively to encode the pre-defined binary code at the sentence level.
no code implementations • 30 Oct 2024 • Andrew Estornell, Jean-Francois Ton, Yuanshun Yao, Yang Liu
Large language models (LLMs) have demonstrated a remarkable ability to serve as general-purpose tools for various language-based tasks.
1 code implementation • 16 Jun 2024 • Rui Zheng, Hongyi Guo, Zhihan Liu, Xiaoying Zhang, Yuanshun Yao, Xiaojun Xu, Zhaoran Wang, Zhiheng Xi, Tao Gui, Qi Zhang, Xuanjing Huang, Hang Li, Yang Liu
We theoretically demonstrate that this iterative reinforcement learning optimization converges to a Nash Equilibrium for the game induced by the agents.
no code implementations • 11 Jun 2024 • Zonglin Di, Zhaowei Zhu, Jinghan Jia, Jiancheng Liu, Zafar Takhirov, Bo Jiang, Yuanshun Yao, Sijia Liu, Yang Liu
Taking inspiration from the influence of label smoothing on model confidence and differential privacy, we propose a simple gradient-based MU approach that uses an inverse process of label smoothing.
1 code implementation • 13 Mar 2024 • Xiaojun Xu, Yuanshun Yao, Yang Liu
While prior works focus on token-level watermark that embeds signals into the output, we design a model-level watermark that embeds signals into the LLM weights, and such signals can be detected by a paired detector.
no code implementations • 12 Mar 2024 • Wei Shen, Xiaoying Zhang, Yuanshun Yao, Rui Zheng, Hongyi Guo, Yang Liu
Reinforcement learning from human feedback (RLHF) is the mainstream paradigm used to align large language models (LLMs) with human preferences.
1 code implementation • 20 Feb 2024 • Jinlong Pang, Jialu Wang, Zhaowei Zhu, Yuanshun Yao, Chen Qian, Yang Liu
In this work, we aim to train models that mitigate group fairness disparity without causing harm to model accuracy.
no code implementations • 16 Feb 2024 • Jiaheng Wei, Yuanshun Yao, Jean-Francois Ton, Hongyi Guo, Andrew Estornell, Yang Liu
FEWL leverages the answers from off-the-shelf LLMs that serve as a proxy of gold-standard answers.
no code implementations • 13 Feb 2024 • Sijia Liu, Yuanshun Yao, Jinghan Jia, Stephen Casper, Nathalie Baracaldo, Peter Hase, Yuguang Yao, Chris Yuhao Liu, Xiaojun Xu, Hang Li, Kush R. Varshney, Mohit Bansal, Sanmi Koyejo, Yang Liu
We explore machine unlearning (MU) in the domain of large language models (LLMs), referred to as LLM unlearning.
no code implementations • 6 Jan 2024 • Hongyi Guo, Yuanshun Yao, Wei Shen, Jiaheng Wei, Xiaoying Zhang, Zhaoran Wang, Yang Liu
The key idea is to first retrieve high-quality samples related to the target domain and use them as In-context Learning examples to generate more samples.
2 code implementations • 14 Oct 2023 • Yuanshun Yao, Xiaojun Xu, Yang Liu
To the best of our knowledge, our work is among the first to explore LLM unlearning.
no code implementations • 9 Oct 2023 • Tongxin Yin, Jean-François Ton, Ruocheng Guo, Yuanshun Yao, Mingyan Liu, Yang Liu
To generalize the abstaining decisions to test samples, we then train a surrogate model to learn the abstaining decisions based on the IP solutions in an end-to-end manner.
1 code implementation • 10 Aug 2023 • Yang Liu, Yuanshun Yao, Jean-Francois Ton, Xiaoying Zhang, Ruocheng Guo, Hao Cheng, Yegor Klochkov, Muhammad Faaiz Taufiq, Hang Li
However, a major challenge faced by practitioners is the lack of clear guidance on evaluating whether LLM outputs align with social norms, values, and regulations.
no code implementations • 30 Jun 2023 • Yuanshun Yao, Yang Liu
Identifying the causes of a model's unfairness is an important yet relatively unexplored task.
1 code implementation • 18 Jan 2023 • Shangyu Xie, Xin Yang, Yuanshun Yao, Tianyi Liu, Taiqing Wang, Jiankai Sun
In this work, we step further to study the leakage in the scenario of the regression model, where the private labels are continuous numbers (instead of discrete labels in classification).
no code implementations • 17 Nov 2022 • Yuanshun Yao, Chong Wang, Hang Li
The key idea is to train a surrogate model to learn the effect of removing a subset of user history on the recommendation.
1 code implementation • 6 Oct 2022 • Zhaowei Zhu, Yuanshun Yao, Jiankai Sun, Hang Li, Yang Liu
Our theoretical analyses show that directly using proxy models can give a false sense of (un)fairness.
1 code implementation • 25 Aug 2022 • Jiankai Sun, Xin Yang, Yuanshun Yao, Junyuan Xie, Di wu, Chong Wang
Federated learning (FL) has gained significant attention recently as a privacy-enhancing tool to jointly train a machine learning model by multiple participants.
no code implementations • 16 Jun 2022 • Ruihan Wu, Xin Yang, Yuanshun Yao, Jiankai Sun, Tianyi Liu, Kilian Q. Weinberger, Chong Wang
Differentially Private (DP) data release is a promising technique to disseminate data without compromising the privacy of data subjects.
no code implementations • 24 May 2022 • Jiankai Sun, Xin Yang, Yuanshun Yao, Junyuan Xie, Di wu, Chong Wang
In this work, we propose two evaluation algorithms that can more accurately compute the widely used AUC (area under curve) metric when using label DP in vFL.
no code implementations • 4 Mar 2022 • Xin Yang, Jiankai Sun, Yuanshun Yao, Junyuan Xie, Chong Wang
Split learning is a distributed training framework that allows multiple parties to jointly train a machine learning model over vertically partitioned data (partitioned by attributes).
no code implementations • 2 Mar 2022 • Jiankai Sun, Xin Yang, Yuanshun Yao, Chong Wang
As the raw labels often contain highly sensitive information, some recent work has been proposed to prevent the label leakage from the backpropagated gradients effectively in vFL.
no code implementations • 2 Mar 2022 • Yuanshun Yao, Chong Wang, Hang Li
Modern recommender systems face an increasing need to explain their recommendations.
no code implementations • 21 Jul 2021 • Jiankai Sun, Yuanshun Yao, Weihao Gao, Junyuan Xie, Chong Wang
Recently researchers have studied input leakage problems in Federated Learning (FL) where a malicious party can reconstruct sensitive training inputs provided by users from shared gradient.
no code implementations • 10 Jun 2021 • Jiankai Sun, Xin Yang, Yuanshun Yao, Aonan Zhang, Weihao Gao, Junyuan Xie, Chong Wang
In this paper, we propose a vFL framework based on Private Set Union (PSU) that allows each party to keep sensitive membership information to itself.
no code implementations • CVPR 2021 • Emily Wenger, Josephine Passananti, Arjun Bhagoji, Yuanshun Yao, Haitao Zheng, Ben Y. Zhao
A critical question remains unanswered: can backdoor attacks succeed using physical objects as triggers, thus making them a credible threat against deep learning systems in the real world?
no code implementations • 24 May 2019 • Yuanshun Yao, Huiying Li, Hai-Tao Zheng, Ben Y. Zhao
Recent work has proposed the concept of backdoor attacks on deep neural networks (DNNs), where misbehaviors are hidden inside "normal" models, only to be triggered by very specific inputs.
1 code implementation • IEEE Symposium on Security and Privacy (SP) 2019 • Bolun Wang, Yuanshun Yao, Shawn Shan, Huiying Li, Bimal Viswanath, Haitao Zheng, Ben Y. Zhao
We identify multiple mitigation techniques via input filters, neuron pruning and unlearning.
no code implementations • 27 Aug 2017 • Yuanshun Yao, Bimal Viswanath, Jenna Cryan, Hai-Tao Zheng, Ben Y. Zhao
Malicious crowdsourcing forums are gaining traction as sources of spreading misinformation online, but are limited by the costs of hiring and managing human workers.
Cryptography and Security Social and Information Networks