Search Results for author: Zhenting Wang

Found 13 papers, 9 papers with code

Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers?

1 code implementation10 Apr 2024 Mingyu Jin, Qinkai Yu, Jingyuan Huang, Qingcheng Zeng, Zhenting Wang, Wenyue Hua, Haiyan Zhao, Kai Mei, Yanda Meng, Kaize Ding, Fan Yang, Mengnan Du, Yongfeng Zhang

We employ a probing technique to extract representations from different layers of the model and apply these to classification tasks.

Finding needles in a haystack: A Black-Box Approach to Invisible Watermark Detection

no code implementations23 Mar 2024 Minzhou Pan, Zhenting Wang, Xin Dong, Vikash Sehwag, Lingjuan Lyu, Xue Lin

In this paper, we propose WaterMark Detection (WMD), the first invisible watermark detection method under a black-box and annotation-free setting.

Health-LLM: Personalized Retrieval-Augmented Disease Prediction System

1 code implementation1 Feb 2024 Mingyu Jin, Qinkai Yu, Dong Shu, Chong Zhang, Lizhou Fan, Wenyue Hua, Suiyuan Zhu, Yanda Meng, Zhenting Wang, Mengnan Du, Yongfeng Zhang

Compared to traditional health management applications, our system has three main advantages: (1) It integrates health reports and medical knowledge into a large model to ask relevant questions to large language model for disease prediction; (2) It leverages a retrieval augmented generation (RAG) mechanism to enhance feature extraction; (3) It incorporates a semi-automated feature updating framework that can merge and delete features to improve accuracy of disease prediction.

Disease Prediction Language Modelling +3

DIAGNOSIS: Detecting Unauthorized Data Usages in Text-to-image Diffusion Models

1 code implementation6 Jul 2023 Zhenting Wang, Chen Chen, Lingjuan Lyu, Dimitris N. Metaxas, Shiqing Ma

To address this issue, we propose a method for detecting such unauthorized data usage by planting the injected memorization into the text-to-image diffusion models trained on the protected dataset.

Memorization

Alteration-free and Model-agnostic Origin Attribution of Generated Images

no code implementations29 May 2023 Zhenting Wang, Chen Chen, Yi Zeng, Lingjuan Lyu, Shiqing Ma

To overcome this problem, we first develop an alteration-free and model-agnostic origin attribution method via input reverse-engineering on image generation models, i. e., inverting the input of a particular model for a specific image.

Image Generation

NOTABLE: Transferable Backdoor Attacks Against Prompt-based NLP Models

1 code implementation28 May 2023 Kai Mei, Zheng Li, Zhenting Wang, Yang Zhang, Shiqing Ma

Such attacks can be easily affected by retraining on downstream tasks and with different prompting strategies, limiting the transferability of backdoor attacks.

UNICORN: A Unified Backdoor Trigger Inversion Framework

1 code implementation5 Apr 2023 Zhenting Wang, Kai Mei, Juan Zhai, Shiqing Ma

Then, it proposes a unified framework to invert backdoor triggers based on the formalization of triggers and the identified inner behaviors of backdoor models from our analysis.

Backdoor Attack

Backdoor Vulnerabilities in Normally Trained Deep Learning Models

no code implementations29 Nov 2022 Guanhong Tao, Zhenting Wang, Siyuan Cheng, Shiqing Ma, Shengwei An, Yingqi Liu, Guangyu Shen, Zhuo Zhang, Yunshu Mao, Xiangyu Zhang

We leverage 20 different types of injected backdoor attacks in the literature as the guidance and study their correspondences in normally trained models, which we call natural backdoor vulnerabilities.

Data Poisoning

Training with More Confidence: Mitigating Injected and Natural Backdoors During Training

1 code implementation13 Feb 2022 Zhenting Wang, Hailun Ding, Juan Zhai, Shiqing Ma

By further analyzing the training process and model architectures, we found that piece-wise linear functions cause this hyperplane surface.

Backdoor Attack

Complex Backdoor Detection by Symmetric Feature Differencing

1 code implementation CVPR 2022 Yingqi Liu, Guangyu Shen, Guanhong Tao, Zhenting Wang, Shiqing Ma, Xiangyu Zhang

Our results on the TrojAI competition rounds 2-4, which have patch backdoors and filter backdoors, show that existing scanners may produce hundreds of false positives (i. e., clean models recognized as trojaned), while our technique removes 78-100% of them with a small increase of false negatives by 0-30%, leading to 17-41% overall accuracy improvement.

Cannot find the paper you are looking for? You can Submit a new open access paper.