She had Cobalt Blue Eyes: Prompt Testing to Create Aligned and Sustainable Language Models

20 Oct 2023 · Veronica Chatrath, Oluwanifemi Bamgbose, Shaina Raza ·

As the use of large language models (LLMs) increases within society, as does the risk of their misuse. Appropriate safeguards must be in place to ensure LLM outputs uphold the ethical standards of society, highlighting the positive role that artificial intelligence technologies can have. Recent events indicate ethical concerns around conventionally trained LLMs, leading to overall unsafe user experiences. This motivates our research question: how do we ensure LLM alignment? In this work, we introduce a test suite of unique prompts to foster the development of aligned LLMs that are fair, safe, and robust. We show that prompting LLMs at every step of the development pipeline, including data curation, pre-training, and fine-tuning, will result in an overall more responsible model. Our test suite evaluates outputs from four state-of-the-art language models: GPT-3.5, GPT-4, OPT, and LLaMA-2. The assessment presented in this paper highlights a gap between societal alignment and the capabilities of current LLMs. Additionally, implementing a test suite such as ours lowers the environmental overhead of making models safe and fair.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Fairness

Datasets

ETHICS

Results from the Paper

Edit

Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods

Add Remove

Absolute Position Encodings • Adam • Attention Dropout • BPE • Cosine Annealing • Dense Connections • Dropout • Fixed Factorized Attention • GELU • GPT-3 • GPT-4 • Label Smoothing • Layer Normalization • Linear Layer • Linear Warmup With Cosine Annealing • Multi-Head Attention • OPT • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Strided Attention • Transformer • Weight Decay

Edit Social Preview

She had Cobalt Blue Eyes: Prompt Testing to Create Aligned and Sustainable Language Models

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove