no code implementations • 30 Aug 2023 • Tanjim Mahmud, Michal Ptaszynski, Juuso Eronen, Fumito Masui
Based on recognizing those research gaps, we provide some suggestions for improving the general research conduct in cyberbullying detection, with a primary focus on low-resource languages.
no code implementations • 29 Aug 2023 • Tanjim Mahmud, Michal Ptaszynski, Fumito Masui
The negative effects of online bullying and harassment are increasing with Internet popularity, especially in social media.
no code implementations • 1 Jun 2023 • Juuso Eronen, Michal Ptaszynski, Karol Nowakowski, Zheng Lin Chia, Fumito Masui
This paper investigates the impact of data volume and the use of similar languages on transfer learning in a machine translation task.
no code implementations • 31 Jan 2023 • Juuso Eronen, Michal Ptaszynski, Fumito Masui
This allows us to select a more suitable transfer language which can be used to better leverage knowledge from high-resource languages in order to improve the performance of language applications lacking data.
no code implementations • 18 Jan 2023 • Karol Nowakowski, Michal Ptaszynski, Kyoko Murasaki, Jagna Nieuważny
Furthermore, we find that if a model pretrained on a related speech variety or an unrelated language with similar phonological characteristics is available, multilingual fine-tuning using additional data from that language can have positive impact on speech recognition performance when there is very little labeled data in the target language.
no code implementations • 19 Jul 2022 • Jagna Nieuwazny, Karol Nowakowski, Michal Ptaszynski, Fumito Masui
Firstly, since the methods used for performing authorship analysis imply that an author can be recognized by the content he or she creates, we were interested in finding out whether it would be possible for an author identification system to correctly attribute works to authors if in the course of years they have undergone a major psychological transition.
no code implementations • 4 Jun 2022 • Juuso Eronen, Michal Ptaszynski, Fumito Masui, Gniewosz Leliwa, Michal Wroczynski, Mateusz Piech, Aleksander Smywinski-Pohl
In this research, we study the change in the performance of machine learning (ML) classifiers when various linguistic preprocessing methods of a dataset were used, with the specific focus on linguistically-backed embeddings in Convolutional Neural Networks (CNN).
no code implementations • 4 Jun 2022 • Juuso Eronen, Michal Ptaszynski, Fumito Masui
In most cases, word embeddings are learned only from raw tokens or in some cases, lemmas.
no code implementations • 2 Jun 2022 • Juuso Eronen, Michal Ptaszynski, Fumito Masui, Masaki Arata, Gniewosz Leliwa, Michal Wroczynski
We study the selection of transfer languages for automatic abusive language detection.
no code implementations • 4 Mar 2022 • Michal Ptaszynski, Pawel Dybala, Tatsuaki Matsuba, Fumito Masui, Rafal Rzepka, Kenji Araki, Yoshio Momouchi
Firstly, we analysed the entries with a multifaceted affect analysis system in order to find distinctive features for cyber-bullying and apply them to a machine learning classifier.
no code implementations • 2 Nov 2021 • Juuso Eronen, Michal Ptaszynski, Fumito Masui, Aleksander Smywiński-Pohl, Gniewosz Leliwa, Michal Wroczynski
We study the effectiveness of Feature Density (FD) using different linguistically-backed feature preprocessing methods in order to estimate dataset complexity, which in turn is used to comparatively estimate the potential performance of machine learning (ML) classifiers prior to any training.
no code implementations • COLING 2020 • Kenji Ryu, Michal Ptaszynski
E-mail is a communication tool widely used by people of all ages on the Internet today, often in business and formal situations, especially in Japan.