Search Results for author: Michal Ptaszynski

Found 16 papers, 0 papers with code

Cyberbullying Detection for Low-resource Languages and Dialects: Review of the State of the Art

no code implementations • 30 Aug 2023 • Tanjim Mahmud, Michal Ptaszynski, Juuso Eronen, Fumito Masui

Based on recognizing those research gaps, we provide some suggestions for improving the general research conduct in cyberbullying detection, with a primary focus on low-resource languages.

Abusive Language

Paper
Add Code

Vulgar Remarks Detection in Chittagonian Dialect of Bangla

no code implementations • 29 Aug 2023 • Tanjim Mahmud, Michal Ptaszynski, Fumito Masui

The negative effects of online bullying and harassment are increasing with Internet popularity, especially in social media.

regression

Paper
Add Code

Improving Polish to English Neural Machine Translation with Transfer Learning: Effects of Data Volume and Language Similarity

no code implementations • 1 Jun 2023 • Juuso Eronen, Michal Ptaszynski, Karol Nowakowski, Zheng Lin Chia, Fumito Masui

This paper investigates the impact of data volume and the use of similar languages on transfer learning in a machine translation task.

Machine Translation Transfer Learning +1

Paper
Add Code

Zero-shot cross-lingual transfer language selection using linguistic similarity

no code implementations • 31 Jan 2023 • Juuso Eronen, Michal Ptaszynski, Fumito Masui

This allows us to select a more suitable transfer language which can be used to better leverage knowledge from high-resource languages in order to improve the performance of language applications lacking data.

Dependency Parsing named-entity-recognition +4

Paper
Add Code

Adapting Multilingual Speech Representation Model for a New, Underresourced Language through Multilingual Fine-tuning and Continued Pretraining

no code implementations • 18 Jan 2023 • Karol Nowakowski, Michal Ptaszynski, Kyoko Murasaki, Jagna Nieuważny

Furthermore, we find that if a model pretrained on a related speech variety or an unrelated language with similar phonological characteristics is available, multilingual fine-tuning using additional data from that language can have positive impact on speech recognition performance when there is very little labeled data in the target language.

speech-recognition Speech Recognition

Paper
Add Code

Can You Fool AI by Doing a 180? $\unicode{x2013}$ A Case Study on Authorship Analysis of Texts by Arata Osada

no code implementations • 19 Jul 2022 • Jagna Nieuwazny, Karol Nowakowski, Michal Ptaszynski, Fumito Masui

Firstly, since the methods used for performing authorship analysis imply that an author can be recognized by the content he or she creates, we were interested in finding out whether it would be possible for an author identification system to correctly attribute works to authors if in the course of years they have undergone a major psychological transition.

Attribute Authorship Attribution +1

Paper
Add Code

Initial Study into Application of Feature Density and Linguistically-backed Embedding to Improve Machine Learning-based Cyberbullying Detection

no code implementations • 4 Jun 2022 • Juuso Eronen, Michal Ptaszynski, Fumito Masui, Gniewosz Leliwa, Michal Wroczynski, Mateusz Piech, Aleksander Smywinski-Pohl

In this research, we study the change in the performance of machine learning (ML) classifiers when various linguistic preprocessing methods of a dataset were used, with the specific focus on linguistically-backed embeddings in Convolutional Neural Networks (CNN).

Paper
Add Code

Exploring the Potential of Feature Density in Estimating Machine Learning Classifier Performance with Application to Cyberbullying Detection

no code implementations • 4 Jun 2022 • Juuso Eronen, Michal Ptaszynski, Fumito Masui, Gniewosz Leliwa, Michal Wroczynski

In this research.

Paper
Add Code

Comparing Performance of Different Linguistically-Backed Word Embeddings for Cyberbullying Detection

no code implementations • 4 Jun 2022 • Juuso Eronen, Michal Ptaszynski, Fumito Masui

In most cases, word embeddings are learned only from raw tokens or in some cases, lemmas.

Word Embeddings

Paper
Add Code

Transfer Language Selection for Zero-Shot Cross-Lingual Abusive Language Detection

no code implementations • 2 Jun 2022 • Juuso Eronen, Michal Ptaszynski, Fumito Masui, Masaki Arata, Gniewosz Leliwa, Michal Wroczynski

We study the selection of transfer languages for automatic abusive language detection.

Abusive Language Cross-Lingual Transfer +1

Paper
Add Code

In the Service of Online Order: Tackling Cyber-Bullying with Machine Learning and Affect Analysis

no code implementations • 4 Mar 2022 • Michal Ptaszynski, Pawel Dybala, Tatsuaki Matsuba, Fumito Masui, Rafal Rzepka, Kenji Araki, Yoshio Momouchi

Firstly, we analysed the entries with a multifaceted affect analysis system in order to find distinctive features for cyber-bullying and apply them to a machine learning classifier.

BIG-bench Machine Learning

Paper
Add Code

Improving Classifier Training Efficiency for Automatic Cyberbullying Detection with Feature Density

no code implementations • 2 Nov 2021 • Juuso Eronen, Michal Ptaszynski, Fumito Masui, Aleksander Smywiński-Pohl, Gniewosz Leliwa, Michal Wroczynski

We study the effectiveness of Feature Density (FD) using different linguistically-backed feature preprocessing methods in order to estimate dataset complexity, which in turn is used to comparatively estimate the potential performance of machine learning (ML) classifiers prior to any training.

Sentiment Analysis