We evaluate a simple approach to improving zero-shot multilingual transfer of mBERT on social media corpus by adding a pretraining task called translation pair prediction (TPP), which predicts whether a pair of cross-lingual texts are a valid translation.
Although language depends heavily on the geographical, temporal, and other social contexts of the speaker, these elements have not been incorporated into modern transformer-based language models.
However, we demonstrate that formalized fairness metrics and quantitative analysis on their own are insufficient for capturing the risk of representational harm in automatic cropping.
Hate Speech has become a major content moderation issue for online social media platforms.
In this paper we introduce a framework for annotating a social media text corpora for various categories.
We also investigated the utility of task label marginalization, joint label classification, and joint training on multilingual datasets as possible improvements to our models.