Multilingual human VALUE(MVALUE) is a multilingual dataset covering 7 concepts of human values: morality, deontology, utilitarianism, fairness, truthfulness, toxicity and harmfulness, each concept subset of it includes positive and negative texts that represent the two opposing directions of the concept. We performed translation on collected human value datasets from English into 15 non-English languages using Google Translate. These languages belong to various language families, including Indo-European (Catalan, French, Indonesian, Portuguese, Spanish), NigerCongo (Chichewa, Swahili), Dravidian (Tamil, Telugu), Uralic (Finnish, Hungarian), Sino-Tibetan (Chinese), Japonic (Japanese), Koreanic (Korean) and Austro-Asiatic (Vietnamese).

In detail, we collected three subsets of the ETHICS dataset (Hendrycks et al.,2021) for morality, deontology, and utilitarianism. Regarding fairness, truthfulness, toxicity, and harmfulness, we chose the StereoSet (Nadeem et al., 2021), TruthfulQA (Lin et al., 2022), REALTOXICITYPROMPTS (Gehman et al., 2020), AdvBench (Zou et al., 2023b) dataset, respectively. Some of these dataset are naturally composed of positive and negative text pairs, we reformatted the rest ones. Refer to Appendix A of our paper for detailed definitions, data splits, and examples of each human value of MVALUE dataset.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks