Search Results for author: Jingkun Tang

Found 2 papers, 1 papers with code

Effective and Efficient Adversarial Detection for Vision-Language Models via A Single Vector

1 code implementation30 Oct 2024 Youcheng Huang, Fengbin Zhu, Jingkun Tang, Pan Zhou, Wenqiang Lei, Jiancheng Lv, Tat-Seng Chua

With the new RADAR dataset, we further develop a novel and effective iN-time Embedding-based AdveRSarial Image DEtection (NEARSIDE) method, which exploits a single vector that distilled from the hidden states of VLMs, which we call the attacking direction, to achieve the detection of adversarial images against benign ones in the input.

Dishonesty in Helpful and Harmless Alignment

no code implementations4 Jun 2024 Youcheng Huang, Jingkun Tang, Duanyu Feng, Zheng Zhang, Wenqiang Lei, Jiancheng Lv, Anthony G. Cohn

We find that this also induces dishonesty in helpful and harmless alignment where LLMs tell lies in generating harmless responses.

Cannot find the paper you are looking for? You can Submit a new open access paper.