no code implementations • 3 Sep 2021 • Yingwen Fu, Nankai Lin, Zhihe Yang, Shengyi Jiang
In this work, we propose a dataset construction framework, which is based on labeled datasets of homologous languages and iterative optimization, to build a Malay NER dataset (MYNER) comprising 28, 991 sentences (over 384 thousand tokens).