1 code implementation • ACL 2022 • Varvara Logacheva, Daryna Dementieva, Sergey Ustyantsev, Daniil Moskovskiy, David Dale, Irina Krotova, Nikita Semenov, Alexander Panchenko
To the best of our knowledge, these are the first parallel datasets for this task. We describe our pipeline in detail to make it fast to set up for a new language or domain, thus contributing to faster and easier development of new parallel resources. We train several detoxification models on the collected data and compare them with several baselines and state-of-the-art unsupervised approaches.
1 code implementation • ACL 2022 • Daniil Moskovskiy, Daryna Dementieva, Alexander Panchenko
This work investigates multilingual and cross-lingual detoxification and the behavior of large multilingual models in this setting.
no code implementations • HumEval (ACL) 2022 • Varvara Logacheva, Daryna Dementieva, Irina Krotova, Alena Fenogenova, Irina Nikishina, Tatiana Shavrina, Alexander Panchenko
It is often difficult to reliably evaluate models which generate text.
1 code implementation • SemEval (NAACL) 2022 • Mikhail Kuimov, Daryna Dementieva, Alexander Panchenko
This paper describes our contribution to SemEval 2022 Task 8: Multilingual News Article Similarity.
no code implementations • 29 May 2025 • Daryna Dementieva, Nikolay Babakov, Alexander Fraser
While Ukrainian NLP has seen progress in many texts processing tasks, emotion classification remains an underexplored area with no publicly available benchmark to date.
no code implementations • 28 May 2025 • Antonia Karamolegkou, Angana Borah, Eunjung Cho, Sagnik Ray Choudhury, Martina Galletti, Rajarshi Ghosh, Pranav Gupta, Oana Ignat, Priyanka Kargupta, Neema Kotonya, Hemank Lamba, Sun-joo Lee, Arushi Mangla, Ishani Mondal, Deniz Nazarova, Poli Nemkova, Dina Pisarevskaya, Naquee Rizwan, Nazanin Sabri, Dominik Stammbach, Anna Steinberg, David Tomás, Steven R Wilson, Bowen Yi, Jessica H Zhu, Arkaitz Zubiaga, Anders Søgaard, Alexander Fraser, Zhijing Jin, Rada Mihalcea, Joel R. Tetreault, Daryna Dementieva
Recent advancements in large language models (LLMs) have unlocked unprecedented possibilities across a range of applications.
no code implementations • 20 May 2025 • Faeze Ghorbanpour, Daryna Dementieva, Alexander Fraser
Notably, our approach is highly data-efficient, retrieving as small as 200 instances in some cases while maintaining superior performance.
no code implementations • 9 May 2025 • Faeze Ghorbanpour, Daryna Dementieva, Alexander Fraser
Despite growing interest in automated hate speech detection, most existing approaches overlook the linguistic diversity of online content.
1 code implementation • 17 Feb 2025 • Shamsuddeen Hassan Muhammad, Nedjma Ousidhoum, Idris Abdulmumin, Jan Philip Wahle, Terry Ruas, Meriem Beloucif, Christine de Kock, Nirmal Surange, Daniela Teodorescu, Ibrahim Said Ahmad, David Ifeoluwa Adelani, Alham Fikri Aji, Felermino D. M. A. Ali, Ilseyar Alimova, Vladimir Araujo, Nikolay Babakov, Naomi Baes, Ana-Maria Bucur, Andiswa Bukula, Guanqun Cao, Rodrigo Tufino Cardenas, Rendi Chevi, Chiamaka Ijeoma Chukwuneke, Alexandra Ciobotaru, Daryna Dementieva, Murja Sani Gadanya, Robert Geislinger, Bela Gipp, Oumaima Hourrane, Oana Ignat, Falalu Ibrahim Lawan, Rooweither Mabuya, Rahmad Mahendra, Vukosi Marivate, Andrew Piper, Alexander Panchenko, Charles Henrique Porto Ferreira, Vitaly Protasov, Samuel Rutunda, Manish Shrivastava, Aura Cristina Udrea, Lilian Diana Awuor Wanzare, Sophie Wu, Florian Valentin Wunderlich, Hanif Muhammad Zhafran, Tianhui Zhang, Yi Zhou, Saif M. Mohammad
In this paper, we present BRIGHTER-- a collection of multilabeled emotion-annotated datasets in 28 different languages.
1 code implementation • 16 Dec 2024 • Daryna Dementieva, Nikolay Babakov, Amit Ronen, Abinew Ali Ayele, Naquee Rizwan, Florian Schneider, Xintong Wang, Seid Muhie Yimam, Daniil Moskovskiy, Elisei Stakovskii, Eran Kaufman, Ashraf Elnagar, Animesh Mukherjee, Alexander Panchenko
Even with various regulations in place across countries and social media platforms (Government of India, 2021; European Parliament and Council of the European Union, 2022, digital abusive speech remains a significant issue.
no code implementations • 20 Aug 2024 • Cem Üyük, Danica Rovó, Shaghayegh Kolli, Rabia Varol, Georg Groh, Daryna Dementieva
In the era dominated by information overload and its facilitation with Large Language Models (LLMs), the prevalence of misinformation poses a significant threat to public discourse and societal well-being.
no code implementations • 27 Jun 2024 • Seid Muhie Yimam, Daryna Dementieva, Tim Fischer, Daniil Moskovskiy, Naquee Rizwan, Punyajoy Saha, Sarthak Roy, Martin Semmann, Alexander Panchenko, Chris Biemann, Animesh Mukherjee
Despite regulations imposed by nations and social media platforms, such as recent EU regulations targeting digital violence, abusive content persists as a significant challenge.
no code implementations • 27 Apr 2024 • Daryna Dementieva, Valeriia Khylenko, Nikolay Babakov, Georg Groh
The task of toxicity detection is still a relevant task, especially in the context of safe and fair LMs development.
no code implementations • 2 Apr 2024 • Daryna Dementieva, Nikolay Babakov, Alexander Panchenko
Text detoxification is a textual style transfer (TST) task where a text is paraphrased from a toxic surface form, e. g. featuring rude words, to the neutral register.
no code implementations • 2 Apr 2024 • Daryna Dementieva, Valeriia Khylenko, Georg Groh
Despite the extensive amount of labeled datasets in the NLP text classification field, the persistent imbalance in data availability across various languages remains evident.
no code implementations • 23 Nov 2023 • Daryna Dementieva, Daniil Moskovskiy, David Dale, Alexander Panchenko
Text detoxification is the task of transferring the style of text from toxic to neutral.
no code implementations • 15 May 2023 • Adam Rydelek, Daryna Dementieva, Georg Groh
The Explainable Detection of Online Sexism task presents the problem of explainable sexism detection through fine-grained categorisation of sexist cases with three subtasks.
1 code implementation • 15 May 2023 • Daniel Schroter, Daryna Dementieva, Georg Groh
This paper presents the best-performing approach alias "Adam Smith" for the SemEval-2023 Task 4: "Identification of Human Values behind Arguments".
no code implementations • 6 Mar 2023 • Edoardo Mosca, Daryna Dementieva, Tohid Ebrahim Ajdari, Maximilian Kummeth, Kirill Gringauz, Yutong Zhou, Georg Groh
Interpretability and human oversight are fundamental pillars of deploying complex NLP models into real-world applications.
1 code implementation • 25 Nov 2022 • Daryna Dementieva, Mikhail Kuimov, Alexander Panchenko
In this work, we propose Multiverse -- a new feature based on multilingual evidence that can be used for fake news detection and improve existing approaches.
1 code implementation • 5 Jun 2022 • Daniil Moskovskiy, Daryna Dementieva, Alexander Panchenko
However, models are not able to perform cross-lingual detoxification and direct fine-tuning on exact language is inevitable.
2 code implementations • 19 Apr 2022 • Daryna Dementieva, Nikolay Babakov, Alexander Panchenko
Formality is one of the important characteristics of text documents.
1 code implementation • EMNLP 2021 • David Dale, Anton Voronov, Daryna Dementieva, Varvara Logacheva, Olga Kozlova, Nikita Semenov, Alexander Panchenko
We compare our models with a number of methods for style transfer.
1 code implementation • ACL 2021 • Daryna Dementieva, Alexander Panchenko
Misleading information spreads on the Internet at an incredible speed, which can lead to irreparable consequences in some cases.
3 code implementations • 19 May 2021 • Daryna Dementieva, Daniil Moskovskiy, Varvara Logacheva, David Dale, Olga Kozlova, Nikita Semenov, Alexander Panchenko
We introduce the first study of automatic detoxification of Russian texts to combat offensive language.
no code implementations • SEMEVAL 2020 • Daryna Dementieva, Igor Markov, Alexander Panchenko
This paper presents a solution for the Span Identification (SI) task in the {``}Detection of Propaganda Techniques in News Articles{''} competition at SemEval-2020.