Search Results for author: Wenjie Jacky Mo

Found 3 papers, 2 papers with code

ThinkGuard: Deliberative Slow Thinking Leads to Cautious Guardrails

1 code implementation19 Feb 2025 Xiaofei Wen, Wenxuan Zhou, Wenjie Jacky Mo, Muhao Chen

Ensuring the safety of large language models (LLMs) is critical as they are deployed in real-world applications.

Computational Efficiency Pass Classification

Rethinking Backdoor Detection Evaluation for Language Models

no code implementations31 Aug 2024 Jun Yan, Wenjie Jacky Mo, Xiang Ren, Robin Jia

Backdoor detection methods aim to detect whether a released model contains a backdoor, so that practitioners can avoid such vulnerabilities.

Cannot find the paper you are looking for? You can Submit a new open access paper.