no code implementations • 3 Nov 2023 • Boyang Zhang, Xinyue Shen, Wai Man Si, Zeyang Sha, Zeyuan Chen, Ahmed Salem, Yun Shen, Michael Backes, Yang Zhang
Moderating offensive, hateful, and toxic language has always been an important but challenging topic in the domain of safe use in NLP.
no code implementations • 7 Aug 2023 • Wai Man Si, Michael Backes, Yang Zhang
In this paper, we discover a new attack strategy against LLM APIs, namely the prompt abstraction attack.
no code implementations • 12 May 2023 • Wai Man Si, Michael Backes, Yang Zhang, Ahmed Salem
In this work, we broaden the scope of this attack to include text generation and classification models, hence showing its broader applicability.
no code implementations • 7 Sep 2022 • Wai Man Si, Michael Backes, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, Savvas Zannettou, Yang Zhang
We show that publicly available chatbots are prone to providing toxic responses when fed toxic queries.
no code implementations • SIGDIAL (ACL) 2021 • Wai Man Si, Prithviraj Ammanabrolu, Mark O. Riedl
This paper explores character-driven story continuation, in which the story emerges through characters' first- and second-person narration as well as dialogue -- requiring models to select language that is consistent with a character's persona and their relationships with other characters while following and advancing the story.