Creating a Lexicon of Bavarian Dialect by Means of Facebook Language Data and Crowdsourcing
Data acquisition in dialectology is typically a tedious task, as dialect samples of spoken language have to be collected via questionnaires or interviews. In this article, we suggest to use the {``}web as a corpus{''} approach for dialectology. We present a case study that demonstrates how authentic language data for the Bavarian dialect (ISO 639-3:bar) can be collected automatically from the social network Facebook. We also show that Facebook can be used effectively as a crowdsourcing platform, where users are willing to translate dialect words collaboratively in order to create a common lexicon of their Bavarian dialect. Key insights from the case study are summarized as {``}lessons learned{''}, together with suggestions for future enhancements of the lexicon creation approach.
PDF Abstract LREC 2016 PDF LREC 2016 Abstract