Towards Machine Ethics with Language Models

We show how to assess a language model’s knowledge of basic concepts of morality. We introduce the ETHICS dataset, a new benchmark that spans concepts in justice, well-being, duties, virtues, and commonsense moral judgments. Models predict widespread moral judgments about diverse written scenarios. This requires connecting physical and social world knowledge to value judgements, a capability that may later serve as a general regularizer of behavior in open-ended settings. We find that language models have low but nontrivial performance. With the ETHICS dataset, we enable meaningful progress on value learning to be made today, providing a steppingstone toward AI that is aligned with human values.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here