Search Results for author: Gelei Deng

Found 21 papers, 8 papers with code

Holmes: Automated Fact Check with Large Language Models

no code implementations6 May 2025 Haoran Ou, Gelei Deng, Xingshuo Han, Jie Zhang, Xinlei He, Han Qiu, Shangwei Guo, Tianwei Zhang

The rise of Internet connectivity has accelerated the spread of disinformation, threatening societal trust, decision-making, and national security.

Fact Checking Retrieval

ai.txt: A Domain-Specific Language for Guiding AI Interactions with the Internet

no code implementations2 May 2025 Yuekang Li, Wei Song, Bangshuo Zhu, Dong Gong, Yi Liu, Gelei Deng, Chunyang Chen, Lei Ma, Jun Sun, Toby Walsh, Jingling Xue

We introduce ai. txt, a novel domain-specific language (DSL) designed to explicitly regulate interactions between AI models, agents, and web content, addressing critical limitations of the widely adopted robots. txt standard.

A Rusty Link in the AI Supply Chain: Detecting Evil Configurations in Model Repositories

no code implementations2 May 2025 Ziqi Ding, QiAn Fu, Junchen Ding, Gelei Deng, Yi Liu, Yuekang Li

Recent advancements in large language models (LLMs) have spurred the development of diverse AI applications from code generation and video editing to text generation; however, AI supply chains such as Hugging Face, which host pretrained models and their associated configuration files contributed by the public, face significant security challenges; in particular, configuration files originally intended to set up models by specifying parameters and initial settings can be exploited to execute unauthorized code, yet research has largely overlooked their security compared to that of the models themselves; in this work, we present the first comprehensive study of malicious configurations on Hugging Face, identifying three attack scenarios (file, website, and repository operations) that expose inherent risks; to address these threats, we introduce CONFIGSCAN, an LLM-based tool that analyzes configuration files in the context of their associated runtime code and critical libraries, effectively detecting suspicious elements with low false positive rates and high accuracy; our extensive evaluation uncovers thousands of suspicious repositories and configuration files, underscoring the urgent need for enhanced security validation in AI model hosting platforms.

Code Generation Text Generation +1

Enhancing Model Defense Against Jailbreaks with Proactive Safety Reasoning

no code implementations31 Jan 2025 Xianglin Yang, Gelei Deng, Jieming Shi, Tianwei Zhang, Jin Song Dong

We propose a novel defense strategy, Safety Chain-of-Thought (SCoT), which harnesses the enhanced \textit{reasoning capabilities} of LLMs for proactive assessment of harmful inputs, rather than simply blocking them.

Blocking Safety Alignment

Indiana Jones: There Are Always Some Useful Ancient Relics

no code implementations27 Jan 2025 Junchen Ding, Jiahao Zhang, Yi Liu, Ziqi Ding, Gelei Deng, Yuekang Li

This paper introduces Indiana Jones, an innovative approach to jailbreaking Large Language Models (LLMs) by leveraging inter-model dialogues and keyword-driven prompts.

Safe + Safe = Unsafe? Exploring How Safe Images Can Be Exploited to Jailbreak Large Vision-Language Models

1 code implementation18 Nov 2024 Chenhang Cui, Gelei Deng, An Zhang, Jingnan Zheng, Yicong Li, Lianli Gao, Tianwei Zhang, Tat-Seng Chua

Recent advances in Large Vision-Language Models (LVLMs) have showcased strong reasoning abilities across multiple modalities, achieving significant breakthroughs in various real-world applications.

Response Generation

Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment

no code implementations18 Oct 2024 Chenhang Cui, An Zhang, Yiyang Zhou, Zhaorun Chen, Gelei Deng, Huaxiu Yao, Tat-Seng Chua

The recent advancements in large language models (LLMs) and pre-trained vision models have accelerated the development of vision-language large models (VLLMs), enhancing the interaction between visual and linguistic modalities.

GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models

1 code implementation22 Aug 2024 Kunsheng Tang, Wenbo Zhou, Jie Zhang, Aishan Liu, Gelei Deng, Shuai Li, Peigui Qi, Weiming Zhang, Tianwei Zhang, Nenghai Yu

By offering a realistic assessment and tailored reduction of gender biases, we hope that our GenderCARE can represent a significant step towards achieving fairness and equity in LLMs.

counterfactual Data Augmentation +2

Efficient Detection of Toxic Prompts in Large Language Models

no code implementations21 Aug 2024 Yi Liu, Junzhe Yu, Huijia Sun, Ling Shi, Gelei Deng, Yuqi Chen, Yang Liu

ToxicDetector achieves high accuracy, efficiency, and scalability, making it a practical method for toxic prompt detection in LLMs.

Computational Efficiency

Image-Based Geolocation Using Large Vision-Language Models

no code implementations18 Aug 2024 Yi Liu, Junchen Ding, Gelei Deng, Yuekang Li, Tianwei Zhang, Weisong Sun, Yaowen Zheng, Jingquan Ge, Yang Liu

Furthermore, our study highlights issues related to dataset integrity, leading to the creation of a more robust dataset and a refined framework that leverages LVLMs' cognitive capabilities to improve geolocation precision.

Continuous Embedding Attacks via Clipped Inputs in Jailbreaking Large Language Models

1 code implementation16 Jul 2024 Zihao Xu, Yi Liu, Gelei Deng, Kailong Wang, Yuekang Li, Ling Shi, Stjepan Picek

Security concerns for large language models (LLMs) have recently escalated, focusing on thwarting jailbreaking attempts in discrete prompts.

Lockpicking LLMs: A Logit-Based Jailbreak Using Token-level Manipulation

1 code implementation20 May 2024 Yuxi Li, Yi Liu, Yuekang Li, Ling Shi, Gelei Deng, Shengquan Chen, Kailong Wang

Large language models (LLMs) have transformed the field of natural language processing, but they remain susceptible to jailbreaking attacks that exploit their capabilities to generate unintended and potentially harmful content.

Glitch Tokens in Large Language Models: Categorization Taxonomy and Effective Detection

no code implementations15 Apr 2024 Yuxi Li, Yi Liu, Gelei Deng, Ying Zhang, Wenjia Song, Ling Shi, Kailong Wang, Yuekang Li, Yang Liu, Haoyu Wang

We present categorizations of the identified glitch tokens and symptoms exhibited by LLMs when interacting with glitch tokens.

A Comprehensive Study of Jailbreak Attack versus Defense for Large Language Models

1 code implementation21 Feb 2024 Zihao Xu, Yi Liu, Gelei Deng, Yuekang Li, Stjepan Picek

Large Language Models (LLMS) have increasingly become central to generating content with potential societal impacts.

Digger: Detecting Copyright Content Mis-usage in Large Language Model Training

no code implementations1 Jan 2024 Haodong Li, Gelei Deng, Yi Liu, Kailong Wang, Yuekang Li, Tianwei Zhang, Yang Liu, Guoai Xu, Guosheng Xu, Haoyu Wang

In this paper, we introduce a detailed framework designed to detect and assess the presence of content from potentially copyrighted books within the training datasets of LLMs.

Language Modeling Language Modelling +2

Prompt Injection attack against LLM-integrated Applications

1 code implementation8 Jun 2023 Yi Liu, Gelei Deng, Yuekang Li, Kailong Wang, ZiHao Wang, XiaoFeng Wang, Tianwei Zhang, Yepang Liu, Haoyu Wang, Yan Zheng, Yang Liu

We deploy HouYi on 36 actual LLM-integrated applications and discern 31 applications susceptible to prompt injection.

Jailbreaking ChatGPT via Prompt Engineering: An Empirical Study

2 code implementations23 May 2023 Yi Liu, Gelei Deng, Zhengzi Xu, Yuekang Li, Yaowen Zheng, Ying Zhang, Lida Zhao, Tianwei Zhang, Kailong Wang, Yang Liu

Our study investigates three key research questions: (1) the number of different prompt types that can jailbreak LLMs, (2) the effectiveness of jailbreak prompts in circumventing LLM constraints, and (3) the resilience of ChatGPT against these jailbreak prompts.

Prompt Engineering

Automatic Code Summarization via ChatGPT: How Far Are We?

no code implementations22 May 2023 Weisong Sun, Chunrong Fang, Yudu You, Yun Miao, Yi Liu, Yuekang Li, Gelei Deng, Shenghan Huang, Yuchen Chen, Quanjun Zhang, Hanwei Qian, Yang Liu, Zhenyu Chen

To support software developers in understanding and maintaining programs, various automatic code summarization techniques have been proposed to generate a concise natural language comment for a given code snippet.

Code Summarization

Cannot find the paper you are looking for? You can Submit a new open access paper.