Most existing dialogue systems fail to respond properly to potentially unsafe user utterances by either ignoring or passively agreeing with them.
13 PAPERS • 1 BENCHMARK
The Innodata Red Teaming Prompts aims to rigorously assess models’ factuality and safety. This dataset, due to its manual creation and breadth of coverage, facilitates a comprehensive examination of LLM performance across diverse scenarios.
1 PAPER • 1 BENCHMARK