TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Average	hendrycks2020ethics	ALBERT-xxlarge	Accuracy (Test)	0.71	# 1
Average	hendrycks2020ethics	RoBERTa-large	Accuracy (Test)	0.68	# 2
Average	hendrycks2020ethics	BERT-large	Accuracy (Test)	0.561	# 3
Average	hendrycks2020ethics	BERT-base	Accuracy (Test)	0.516	# 4
Average	hendrycks2020ethics	GPT-3 (few-shot)	Accuracy (Test)	0.368	# 5
Average	hendrycks2020ethics	Random Baseline	Accuracy (Test)	0.24.2	# 6

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/aligning-ai-with-shared-human-values/average-on-hendrycks2020ethics)](https://paperswithcode.com/sota/average-on-hendrycks2020ethics?p=aligning-ai-with-shared-human-values)`

Aligning AI With Shared Human Values

5 Aug 2020 · Dan Hendrycks, Collin Burns, Steven Basart, Andrew Critch, Jerry Li, Dawn Song, Jacob Steinhardt ·

We show how to assess a language model's knowledge of basic concepts of morality. We introduce the ETHICS dataset, a new benchmark that spans concepts in justice, well-being, duties, virtues, and commonsense morality. Models predict widespread moral judgments about diverse text scenarios. This requires connecting physical and social world knowledge to value judgements, a capability that may enable us to steer chatbot outputs or eventually regularize open-ended reinforcement learning agents. With the ETHICS dataset, we find that current language models have a promising but incomplete ability to predict basic human ethical judgements. Our work shows that progress can be made on machine ethics today, and it provides a steppingstone toward AI that is aligned with human values.

PDF Abstract

Code

Add Remove Mark official

hendrycks/ethics official

210

hendrycks/test

929

Tasks

Add Remove

Ethics

reinforcement-learning

Reinforcement Learning (RL)

World Knowledge

Datasets

Introduced in the Paper:

ETHICS

Used in the Paper:

test

Results from the Paper

Edit

Ranked #1 on Average on hendrycks2020ethics

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Average	hendrycks2020ethics	ALBERT-xxlarge	Accuracy (Test)	0.71	# 1	Compare
Average	hendrycks2020ethics	RoBERTa-large	Accuracy (Test)	0.68	# 2	Compare
Average	hendrycks2020ethics	BERT-large	Accuracy (Test)	0.561	# 3	Compare
Average	hendrycks2020ethics	BERT-base	Accuracy (Test)	0.516	# 4	Compare
Average	hendrycks2020ethics	GPT-3 (few-shot)	Accuracy (Test)	0.368	# 5	Compare
Average	hendrycks2020ethics	Random Baseline	Accuracy (Test)	0.24.2	# 6	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Aligning AI With Shared Human Values

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove