Search Results for author: Nikita Semenov

Found 7 papers, 5 papers with code

Detecting Inappropriate Messages on Sensitive Topics that Could Harm a Company’s Reputation

no code implementations EACL (BSNLP) 2021 Nikolay Babakov, Varvara Logacheva, Olga Kozlova, Nikita Semenov, Alexander Panchenko

We define a set of sensitive topics that can yield inappropriate and toxic messages and describe the methodology of collecting and labelling a dataset for appropriateness.

ParaDetox: Detoxification with Parallel Data

1 code implementation ACL 2022 Varvara Logacheva, Daryna Dementieva, Sergey Ustyantsev, Daniil Moskovskiy, David Dale, Irina Krotova, Nikita Semenov, Alexander Panchenko

To the best of our knowledge, these are the first parallel datasets for this task. We describe our pipeline in detail to make it fast to set up for a new language or domain, thus contributing to faster and easier development of new parallel resources. We train several detoxification models on the collected data and compare them with several baselines and state-of-the-art unsupervised approaches.

Sentence

RuPAWS: A Russian Adversarial Dataset for Paraphrase Identification

1 code implementation LREC 2022 Nikita Martynov, Irina Krotova, Varvara Logacheva, Alexander Panchenko, Olga Kozlova, Nikita Semenov

We compare it to the largest available dataset for Russian ParaPhraser and show that the best available paraphrase identifiers for the Russian language fail on the RuPAWS dataset.

Paraphrase Identification

Detecting Inappropriate Messages on Sensitive Topics that Could Harm a Company's Reputation

1 code implementation9 Mar 2021 Nikolay Babakov, Varvara Logacheva, Olga Kozlova, Nikita Semenov, Alexander Panchenko

We define a set of sensitive topics that can yield inappropriate and toxic messages and describe the methodology of collecting and labeling a dataset for appropriateness.

Cannot find the paper you are looking for? You can Submit a new open access paper.