Search Results for author: Tianshuo Cong

Found 7 papers, 4 papers with code

JailbreakEval: An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models

1 code implementation13 Jun 2024 Delong Ran, JinYuan Liu, Yichen Gong, Jingyi Zheng, Xinlei He, Tianshuo Cong, Anyu Wang

Jailbreak attacks aim to induce Large Language Models (LLMs) to generate harmful responses for forbidden instructions, presenting severe misuse threats to LLMs.

Have You Merged My Model? On The Robustness of Large Language Model IP Protection Methods Against Model Merging

1 code implementation8 Apr 2024 Tianshuo Cong, Delong Ran, Zesen Liu, Xinlei He, JinYuan Liu, Yichen Gong, Qi Li, Anyu Wang, XiaoYun Wang

Model merging is a promising lightweight model empowerment technique that does not rely on expensive computing devices (e. g., GPUs) or require the collection of specific training data.

Language Modelling Large Language Model +2

FigStep: Jailbreaking Large Vision-language Models via Typographic Visual Prompts

2 code implementations9 Nov 2023 Yichen Gong, Delong Ran, JinYuan Liu, Conglei Wang, Tianshuo Cong, Anyu Wang, Sisi Duan, XiaoYun Wang

Ensuring the safety of artificial intelligence-generated content (AIGC) is a longstanding topic in the artificial intelligence (AI) community, and the safety concerns associated with Large Language Models (LLMs) have been widely investigated.

Optical Character Recognition (OCR) Safety Alignment

SSLGuard: A Watermarking Scheme for Self-supervised Learning Pre-trained Encoders

1 code implementation27 Jan 2022 Tianshuo Cong, Xinlei He, Yang Zhang

Recent research has shown that the machine learning model's copyright is threatened by model stealing attacks, which aim to train a surrogate model to mimic the behavior of a given model.

Self-Supervised Learning

DO-AutoEncoder: Learning and Intervening Bivariate Causal Mechanisms in Images

no code implementations25 Sep 2019 Tianshuo Cong, Dan Peng, Furui Liu, Zhitang Chen

Our experiments demonstrate our method is able to correctly identify the bivariate causal relationship between concepts in images and the representation learned enables a do-calculus manipulation to images, which generates artificial images that might possibly break the physical law depending on where we intervene the causal system.

Adversarial Attack Representation Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.