1 code implementation • 22 May 2024 • Xuyang Ge, Fukang Zhu, Wentao Shu, Junxuan Wang, Zhengfu He, Xipeng Qiu
Circuit analysis of any certain model behavior is a central task in mechanistic interpretability.
1 code implementation • 18 Mar 2024 • Weikang Zhou, Xiao Wang, Limao Xiong, Han Xia, Yingshuang Gu, Mingxu Chai, Fukang Zhu, Caishuang Huang, Shihan Dou, Zhiheng Xi, Rui Zheng, Songyang Gao, Yicheng Zou, Hang Yan, Yifan Le, Ruohui Wang, Lijun Li, Jing Shao, Tao Gui, Qi Zhang, Xuanjing Huang
This paper introduces EasyJailbreak, a unified framework simplifying the construction and evaluation of jailbreak attacks against LLMs.
no code implementations • 17 Jan 2023 • Bing Su, Fukang Zhu, Ke Zhu
For the log-SHE model, its spatial near-epoch dependence (NED) property is investigated, and a systematic statistical inference procedure is provided, including the maximum likelihood and generalized method of moments estimators, the Wald, Lagrange multiplier and likelihood-ratio-type D tests for model parameter constraints, and the overidentification test for the model diagnostic checking.