no code implementations • 28 Apr 2024 • Nishant Luitel, Nirajan Bekoju, Anand Kumar Sah, Subarna Shakya
The task of Spell Correction(SC) in low-resource languages presents a significant challenge due to the availability of only a limited corpus of data and no annotated spelling correction datasets.
no code implementations • 28 Apr 2024 • Nishant Luitel, Nirajan Bekoju, Anand Kumar Sah, Subarna Shakya
To reduce this gap we used 6 different tokenization schemes to pretrain relatively small language models in Nepali and used the representations learned to finetune on several downstream tasks.