no code implementations • NAACL 2022 • Gati Martin, Medard Edmund Mswahili, Young-Seob Jeong, Jeong Young-Seob
The rapid development of social networks, electronic commerce, mobile Internet, and other technologies, has influenced the growth of Web data. Social media and Internet forums are valuable sources of citizens’ opinions, which can be analyzed for community development and user behavior analysis. Unfortunately, the scarcity of resources (i. e., datasets or language models) become a barrier to the development of natural language processing applications in low-resource languages. Thanks to the recent growth of online forums and news platforms of Swahili, we introduce two datasets of Swahili in this paper: a pre-training dataset of approximately 105MB with 16M words and annotated dataset of 13K instances for the emotion classification task. The emotion classification dataset is manually annotated by two native Swahili speakers. We pre-trained a new monolingual language model for Swahili, namely SwahBERT, using our collected pre-training data, and tested it with four downstream tasks including emotion classification. We found that SwahBERT outperforms multilingual BERT, a well-known existing language model, in almost all downstream tasks.
no code implementations • 19 Apr 2021 • Gati L. Martin, Medard E. Mswahili, Young-Seob Jeong
The data was created by extracting and annotating 8. 2k reviews and comments on different social media platforms and the ISEAR emotion dataset.
no code implementations • COLING 2016 • Hyoung-Gyu Lee, Jun-Seok Kim, Joong-Hwi Shin, Jaesong Lee, Ying-Xiu Quan, Young-Seob Jeong
In this paper, we introduce papago - a translator for mobile device which is equipped with new features that can provide convenience for users.
no code implementations • LREC 2016 • Young-Seob Jeong, Won-Tae Joo, Hyun-Woo Do, Chae-Gyun Lim, Key-Sun Choi, Ho-Jin Choi
Before developing the system, it first necessary to define or design the structure of temporal information.