M\=aOri Loanwords: A Corpus of New Zealand English Tweets

M{\=a}ori loanwords are widely used in New Zealand English for various social functions by New Zealanders within and outside of the M{\=a}ori community. Motivated by the lack of linguistic resources for studying how M{\=a}ori loanwords are used in social media, we present a new corpus of New Zealand English tweets. We collected tweets containing selected M{\=a}ori words that are likely to be known by New Zealanders who do not speak M{\=a}ori. Since over 30{\%} of these words turned out to be irrelevant, we manually annotated a sample of our tweets into relevant and irrelevant categories. This data was used to train machine learning models to automatically filter out irrelevant tweets.

PDF Abstract
No code implementations yet. Submit your code now



  Add Datasets introduced or used in this paper

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.


No methods listed for this paper. Add relevant methods here