Do-Not-Answer

Introduced by Wang et al. in Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs

Do-Not-Answer is a dataset to evaluate safeguards in large language models, and deploy safer open-source LLMs at a low cost. The dataset is curated and filtered to consist only of instructions that responsible language models should not follow. We annotate and assess the responses of six popular LLMs to these instructions.

Homepage

Benchmarks

Add a new result Link an existing benchmark

No benchmarks yet. Start a new benchmark or link an existing one.

Papers

Paper	Code	Results	Date	Stars

Dataset Loaders

Add Remove

No data loaders found. You can submit your data loader here.

Tasks

Language Modelling

Similar Datasets

Do-Not-Answer

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Similar Datasets

SafetyBench

RewardBench

ToolE

CValues

Usage

License

Modalities

Languages

Do-Not-Answer

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Similar Datasets

SafetyBench

RewardBench

ToolE

CValues

Usage

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages