no code implementations • 15 Apr 2025 • Jirui Yang, Zheyu Lin, Shuhan Yang, Zhihui Lu, Xin Du
To address these challenges, we propose Concept Enhancement Engineering (CEE), a novel defense framework that leverages representation engineering to enhance the safety of embodied LLMs by dynamically steering their internal activations.
no code implementations • ACL 2021 • Yadong Xi, Xiaoxi Mao, Le Li, Lei Lin, Yanjiang Chen, Shuhan Yang, Xuhan Chen, Kailun Tao, Zhi Li, Gongzheng li, Lin Jiang, Siyan Liu, Zeng Zhao, Minlie Huang, Changjie Fan, Zhipeng Hu
Equipped with GPT-2 and the latest GPT-3, AI Dungeon has been seen as a famous example of the powerful text generation capabilities of large-scale pre-trained language models, and a possibility for future games.