Search Results for author: Jackie CK Cheung

Found 2 papers, 0 papers with code

From Representational Harms to Quality-of-Service Harms: A Case Study on Llama 2 Safety Safeguards

no code implementations20 Mar 2024 Khaoula Chehbouni, Megha Roshan, Emmanuel Ma, Futian Andrew Wei, Afaf Taik, Jackie CK Cheung, Golnoosh Farnadi

Despite growing mitigation efforts to develop safety safeguards, such as supervised safety-oriented fine-tuning and leveraging safe reinforcement learning from human feedback, multiple concerns regarding the safety and ingrained biases in these models remain.

Safe Reinforcement Learning

Unsupervised Layer-wise Score Aggregation for Textual OOD Detection

no code implementations20 Feb 2023 Maxime Darrin, Guillaume Staerman, Eduardo Dadalto Câmara Gomes, Jackie CK Cheung, Pablo Piantanida, Pierre Colombo

More importantly, we show that the usual choice (the last layer) is rarely the best one for OOD detection and that far better results could be achieved if the best layer were picked.

feature selection Out of Distribution (OOD) Detection

Cannot find the paper you are looking for? You can Submit a new open access paper.