no code implementations • EMNLP 2021 • Koustava Goswami, Sourav Dutta, Haytham Assem, Theodorus Fransen, John P. McCrae
We demonstrate the efficacy of an unsupervised as well as a weakly supervised variant of our framework on STS, BUCC and Tatoeba benchmark tasks.
no code implementations • LREC 2022 • Priya Rani, John P. McCrae, Theodorus Fransen
This data-set is the first Magahi-Hindi-English code-mixed data-set for similar language identification task.
no code implementations • 7 Mar 2024 • Priya Rani, Gaurav Negi, Theodorus Fransen, John P. McCrae
The present paper introduces new sentiment data, MaCMS, for Magahi-Hindi-English (MHE) code-mixed language, where Magahi is a less-resourced minority language.
1 code implementation • 9 Nov 2023 • Koustava Goswami, Priya Rani, Theodorus Fransen, John P. McCrae
We train an encoder to gain morphological knowledge of a language and transfer the knowledge to perform unsupervised and weakly-supervised cognate detection tasks with and without the pivot language for the closely-related languages.
no code implementations • MTSummit 2021 • Atul Kr. Ojha, Chao-Hong Liu, Katharina Kann, John Ortega, Sheetal Shatam, Theodorus Fransen
Maximum system performance was computed using BLEU and follow as 36. 0 for English--Irish, 34. 6 for Irish--English, 24. 2 for English--Marathi, and 31. 3 for Marathi--English.
no code implementations • COLING 2020 • Koustava Goswami, Rajdeep Sarkar, Bharathi Raja Chakravarthi, Theodorus Fransen, John P. McCrae
Automatic Language Identification (LI) or Dialect Identification (DI) of short texts of closely related languages or dialects, is one of the primary steps in many natural language processing pipelines.
no code implementations • SEMEVAL 2020 • Koustava Goswami, Priya Rani, Bharathi Raja Chakravarthi, Theodorus Fransen, John P. McCrae
Code mixing is a common phenomena in multilingual societies where people switch from one language to another for various reasons.
no code implementations • LREC 2020 • Priya Rani, Shardul Suryawanshi, Koustava Goswami, Bharathi Raja Chakravarthi, Theodorus Fransen, John Philip McCrae
Hate speech detection in social media communication has become one of the primary concerns to avoid conflicts and curb undesired activities.
1 code implementation • LREC 2020 • Sina Ahmadi, John Philip McCrae, Sanni Nimb, Fahad Khan, Monica Monachini, Bolette Pedersen, Thierry Declerck, Tanja Wissik, Bell, Andrea i, Irene Pisani, Thomas Troelsg{\aa}rd, Sussi Olsen, Simon Krek, Veronika Lipp, Tam{\'a}s V{\'a}radi, L{\'a}szl{\'o} Simon, Andr{\'a}s Gyorffy, Carole Tiberius, Tanneke Schoonheim, Yifat Ben Moshe, Maya Rudich, Raya Abu Ahmad, Dorielle Lonke, Kira Kovalenko, Margit Langemets, Jelena Kallas, Oksana Dereza, Theodorus Fransen, David Cillessen, David Lindemann, Mikel Alonso, Ana Salgado, Jos{\'e} Luis Sancho, Rafael-J. Ure{\~n}a-Ruiz, Jordi Porta Zamorano, Kiril Simov, Petya Osenova, Zara Kancheva, Ivaylo Radev, Ranka Stankovi{\'c}, Andrej Perdih, Dejan Gabrovsek
Aligning senses across resources and languages is a challenging task with beneficial applications in the field of natural language processing and electronic lexicography.