M\=aOri Loanwords: A Corpus of New Zealand English Tweets
M{\=a}ori loanwords are widely used in New Zealand English for various social functions by New Zealanders within and outside of the M{\=a}ori community. Motivated by the lack of linguistic resources for studying how M{\=a}ori loanwords are used in social media, we present a new corpus of New Zealand English tweets. We collected tweets containing selected M{\=a}ori words that are likely to be known by New Zealanders who do not speak M{\=a}ori. Since over 30{\%} of these words turned out to be irrelevant, we manually annotated a sample of our tweets into relevant and irrelevant categories. This data was used to train machine learning models to automatically filter out irrelevant tweets.
PDF Abstract