WEATHub is a dataset containing 24 languages. It contains words organized into groups of (target1, target2, attribute1, attribute2) to measure the association target1:target2 :: attribute1:attribute2. For example target1 can be insects, target2 can be flowers. And we might be trying to measure whether we find insects or flowers pleasant or unpleasant. The measurement of word associations is quantified using the WEAT metric in our paper. It is a metric that calculates an effect size (Cohen's d) and also provides a p-value (to measure statistical significance of the results). In our paper, we use word embeddings from language models to perform these tests and understand biased associations in language models across different languages.
1 PAPER • NO BENCHMARKS YET
We have constructed our dataset by five fields available on the website that were found convenient for the study of student expectations and experience. This includes out-of-five star ratings on easiness, understandability, recitation, accessibility and helpfulness. Average rating was calculated based on these given five fields. Overall sentiment of the review was determined based on the average rating where any score higher than 3.5 (>=) was labeled as a positive review, and anything lower than 2.5 (<) was labeled as a negative review. The five main aspects students needed to rate was given below.
0 PAPER • NO BENCHMARKS YET