no code implementations • 8 Nov 2024 • Kushal Tatariya, Artur Kulmizev, Wessel Poelman, Esther Ploeger, Marcel Bollmann, Johannes Bjerva, Jiaming Luo, Heather Lent, Miryam de Lhoneux
Wikipedia's perceived high quality and broad language coverage have established it as a fundamental resource in multilingual NLP.
1 code implementation • 15 Oct 2024 • Kushal Tatariya, Vladimir Araujo, Thomas Bauwens, Miryam de Lhoneux
Pixel-based language models have emerged as a compelling alternative to subword-based language modelling, particularly because they can represent virtually any script.
no code implementations • 5 Feb 2024 • Kushal Tatariya, Heather Lent, Johannes Bjerva, Miryam de Lhoneux
Emotion classification is a challenging task in NLP due to the inherent idiosyncratic and subjective nature of linguistic expression, especially with code-mixed data.
1 code implementation • 30 Oct 2023 • Heather Lent, Kushal Tatariya, Raj Dabre, Yiyi Chen, Marcell Fekete, Esther Ploeger, Li Zhou, Ruth-Ann Armstrong, Abee Eijansantos, Catriona Malau, Hans Erik Heje, Ernests Lavrinovics, Diptesh Kanojia, Paul Belony, Marcel Bollmann, Loïc Grobol, Miryam de Lhoneux, Daniel Hershcovich, Michel DeGraff, Anders Søgaard, Johannes Bjerva
Creoles represent an under-explored and marginalized group of languages, with few available resources for NLP research. While the genealogical ties between Creoles and a number of highly-resourced languages imply a significant potential for transfer learning, this potential is hampered due to this lack of annotated data.