Search Results for author: Biao Yi

Found 10 papers, 2 papers with code

Probe before You Talk: Towards Black-box Defense against Backdoor Unalignment for Large Language Models

no code implementations19 Jun 2025 Biao Yi, Tiansheng Huang, Sishuo Chen, Tong Li, Zheli Liu, Zhixuan Chu, Yiming Li

It is motivated by an intriguing observation (dubbed the probe concatenate effect), where concatenated triggered samples significantly reduce the refusal rate of the backdoored LLM towards a malicious probe, while non-triggered samples have little effect.

Large Language Model Safety Alignment

When Safety Detectors Aren't Enough: A Stealthy and Effective Jailbreak Attack on LLMs via Steganographic Techniques

no code implementations22 May 2025 Jianing Geng, Biao Yi, Zekun Fei, Tongxi Wu, Lihai Nie, Zheli Liu

Jailbreak attacks pose a serious threat to large language models (LLMs) by bypassing built-in safety mechanisms and leading to harmful outputs.

Benchmarking

CTRAP: Embedding Collapse Trap to Safeguard Large Language Models from Harmful Fine-Tuning

no code implementations22 May 2025 Biao Yi, Tiansheng Huang, Baolei Zhang, Tong Li, Lihai Nie, Zheli Liu, Li Shen

Fine-tuning-as-a-service, while commercially successful for Large Language Model (LLM) providers, exposes models to harmful fine-tuning attacks.

Language Modeling Language Modelling +2

EcoAgent: An Efficient Edge-Cloud Collaborative Multi-Agent Framework for Mobile Automation

no code implementations8 May 2025 Biao Yi, Xavier Hu, Yurun Chen, Shengyu Zhang, Hongxia Yang, Fan Wu, Fei Wu

Cloud-based mobile agents powered by (multimodal) large language models ((M)LLMs) offer strong reasoning abilities but suffer from high latency and cost.

Traceback of Poisoning Attacks to Retrieval-Augmented Generation

no code implementations30 Apr 2025 Baolei Zhang, Haoran Xin, Minghong Fang, Zhuqing Liu, Biao Yi, Tong Li, Zheli Liu

This work pioneers the traceback of poisoned texts in RAG systems, providing a practical and promising defense mechanism to enhance their security.

RAG Retrieval +1

Your Semantic-Independent Watermark is Fragile: A Semantic Perturbation Attack against EaaS Watermark

1 code implementation14 Nov 2024 Zekun Fei, Biao Yi, Jianing Geng, Ruiqi He, Lihai Nie, Zheli Liu

Embedding-as-a-Service (EaaS) has emerged as a successful business pattern but faces significant challenges related to various forms of copyright infringement, particularly, the API misuse and model extraction attacks.

Model extraction

Prompt-Guided Internal States for Hallucination Detection of Large Language Models

no code implementations7 Nov 2024 Fujie Zhang, Peiqi Yu, Biao Yi, Baolei Zhang, Tong Li, Zheli Liu

By utilizing appropriate prompts to guide changes in the structure related to text truthfulness within the LLM's internal states, we make this structure more salient and consistent across texts from different domains.

Domain Generalization Hallucination

BadActs: A Universal Backdoor Defense in the Activation Space

1 code implementation18 May 2024 Biao Yi, Sishuo Chen, Yiming Li, Tong Li, Baolei Zhang, Zheli Liu

Backdoor attacks pose an increasingly severe security threat to Deep Neural Networks (DNNs) during their development stage.

backdoor defense

Semantic-Preserving Linguistic Steganography by Pivot Translation and Semantic-Aware Bins Coding

no code implementations8 Mar 2022 Tianyu Yang, Hanzhou Wu, Biao Yi, Guorui Feng, Xinpeng Zhang

In this paper, we propose a novel LS method to modify a given text by pivoting it between two different languages and embed secret data by applying a GLS-like information encoding strategy.

Language Modelling Linguistic steganography +2

Exploiting Language Model for Efficient Linguistic Steganalysis

no code implementations26 Jul 2021 Biao Yi, Hanzhou Wu, Guorui Feng, Xinpeng Zhang

Such kind of difference can be naturally captured by the language model used for generating stego texts.

Language Modeling Language Modelling +2

Cannot find the paper you are looking for? You can Submit a new open access paper.