Search Results for author: Aradhana Sinha

Found 3 papers, 0 papers with code

Improving Few-shot Generalization of Safety Classifiers via Data Augmented Parameter-Efficient Fine-Tuning

no code implementations25 Oct 2023 Ananth Balashankar, Xiao Ma, Aradhana Sinha, Ahmad Beirami, Yao Qin, Jilin Chen, Alex Beutel

As large language models (LLMs) are widely adopted, new safety issues and policies emerge, to which existing safety classifiers do not generalize well.

Data Augmentation Few-Shot Learning +1

Break it, Imitate it, Fix it: Robustness by Generating Human-Like Attacks

no code implementations25 Oct 2023 Aradhana Sinha, Ananth Balashankar, Ahmad Beirami, Thi Avrahami, Jilin Chen, Alex Beutel

We demonstrate the advantages of this system on the ANLI and hate speech detection benchmark datasets - both collected via an iterative, adversarial human-and-model-in-the-loop procedure.

Hate Speech Detection

Cannot find the paper you are looking for? You can Submit a new open access paper.