no code implementations • PaM 2020 • Saba Anwar, Artem Shelmanov, Alexander Panchenko, Chris Biemann
We investigate a simple yet effective method, lexical substitution with word representation models, to automatically expand a small set of frame-annotated sentences with new words for their respective roles and LUs.
1 code implementation • LREC 2022 • Nikita Martynov, Irina Krotova, Varvara Logacheva, Alexander Panchenko, Olga Kozlova, Nikita Semenov
We compare it to the largest available dataset for Russian ParaPhraser and show that the best available paraphrase identifiers for the Russian language fail on the RuPAWS dataset.
no code implementations • EACL (BSNLP) 2021 • Nikolay Babakov, Varvara Logacheva, Olga Kozlova, Nikita Semenov, Alexander Panchenko
We define a set of sensitive topics that can yield inappropriate and toxic messages and describe the methodology of collecting and labelling a dataset for appropriateness.
1 code implementation • SemEval (NAACL) 2022 • Mikhail Kuimov, Daryna Dementieva, Alexander Panchenko
This paper describes our contribution to SemEval 2022 Task 8: Multilingual News Article Similarity.
no code implementations • EACL (GWC) 2021 • Irina Nikishina, Natalia Loukachevitch, Varvara Logacheva, Alexander Panchenko
The vast majority of the existing approaches for taxonomy enrichment apply word embeddings as they have proven to accumulate contexts (in a broad sense) extracted from texts which are sufficient for attaching orphan words to the taxonomy.
no code implementations • HumEval (ACL) 2022 • Varvara Logacheva, Daryna Dementieva, Irina Krotova, Alena Fenogenova, Irina Nikishina, Tatiana Shavrina, Alexander Panchenko
It is often difficult to reliably evaluate models which generate text.
1 code implementation • ACL 2022 • Varvara Logacheva, Daryna Dementieva, Sergey Ustyantsev, Daniil Moskovskiy, David Dale, Irina Krotova, Nikita Semenov, Alexander Panchenko
To the best of our knowledge, these are the first parallel datasets for this task. We describe our pipeline in detail to make it fast to set up for a new language or domain, thus contributing to faster and easier development of new parallel resources. We train several detoxification models on the collected data and compare them with several baselines and state-of-the-art unsupervised approaches.
1 code implementation • ACL 2022 • Daniil Moskovskiy, Daryna Dementieva, Alexander Panchenko
This work investigates multilingual and cross-lingual detoxification and the behavior of large multilingual models in this setting.
1 code implementation • ACL 2022 • Nikolay Babakov, David Dale, Varvara Logacheva, Alexander Panchenko
In both tasks, the system is supposed to generate a text which should be semantically similar to the input text.
1 code implementation • ACL 2022 • Artem Vazhentsev, Gleb Kuzmin, Artem Shelmanov, Akim Tsvigun, Evgenii Tsymbalov, Kirill Fedyanin, Maxim Panov, Alexander Panchenko, Gleb Gusev, Mikhail Burtsev, Manvel Avetisian, Leonid Zhukov
Uncertainty estimation (UE) of model predictions is a crucial step for a variety of tasks such as active learning, misclassification detection, adversarial attack detection, out-of-distribution detection, etc.
no code implementations • MMMPIE (COLING) 2022 • Anton Razzhigaev, Anton Voronov, Andrey Kaznacheev, Andrey Kuznetsov, Denis Dimitrov, Alexander Panchenko
Pixel-level autoregression with Transformer models (Image GPT or iGPT) is one of the recent approaches to image generation that has not received massive attention and elaboration due to quadratic complexity of attention as it imposes huge memory requirements and thus restricts the resolution of the generated images.
no code implementations • COLING (TextGraphs) 2022 • Irina Nikishina, Alsu Vakhitova, Elena Tutubalina, Alexander Panchenko
We propose a method that combines graph-, and text-based contextualized representations from transformer networks to predict new entries to the taxonomy.
1 code implementation • 17 Aug 2023 • Nikolay Babakov, David Dale, Ilya Gusev, Irina Krotova, Alexander Panchenko
Text style transfer techniques are gaining popularity in natural language processing allowing paraphrasing text in the required form: from toxic to neural, from formal to informal, from old to the modern English language, etc.
no code implementations • 5 Jun 2023 • Viktoriia Chekalina, Georgii Novikov, Julia Gusak, Ivan Oseledets, Alexander Panchenko
On the downstream tasks, including language understanding and text summarization, the model performs similarly to the original GPT-2 model.
no code implementations • 1 May 2023 • Viktoriia Chekalina, Alexander Panchenko
In this paper, we present a submission to the Touche lab's Task 2 on Argument Retrieval for Comparative Questions.
1 code implementation • 9 Jan 2023 • Akim Tsvigun, Ivan Lysenko, Danila Sedashov, Ivan Lazichny, Eldar Damirov, Vladimir Karlov, Artemy Belousov, Leonid Sanochkin, Maxim Panov, Alexander Panchenko, Mikhail Burtsev, Artem Shelmanov
Active Learning (AL) is a technique developed to reduce the amount of annotation required to achieve a certain level of machine learning model performance.
no code implementations • 29 Dec 2022 • Nikolay Babakov, Maria Lysyuk, Alexander Shvets, Lilya Kazakova, Alexander Panchenko
This paper presents a solution to the GenChal 2022 shared task dedicated to feedback comment generation for writing learning.
1 code implementation • 25 Nov 2022 • Daryna Dementieva, Mikhail Kuimov, Alexander Panchenko
In this work, we propose Multiverse -- a new feature based on multilingual evidence that can be used for fake news detection and improve existing approaches.
2 code implementations • 20 Jun 2022 • Nikolay Babakov, David Dale, Varvara Logacheva, Irina Krotova, Alexander Panchenko
Text style transfer techniques are gaining popularity in Natural Language Processing, finding various applications such as text detoxification, sentiment, or formality transfer.
no code implementations • 18 Jun 2022 • Evgeny Kotelnikov, Natalia Loukachevitch, Irina Nikishina, Alexander Panchenko
Argumentation analysis is a field of computational linguistics that studies methods for extracting arguments from texts and the relationships between them, as well as building argumentation structure of texts.
1 code implementation • COLING 2020 • Nikolay Arefyev, Boris Sheludko, Alexander Podolskiy, Alexander Panchenko
Lexical substitution, i. e. generation of plausible words that can replace a particular target word in a given context, is an extremely powerful technology that can be used as a backbone of various NLP applications, including word sense induction and disambiguation, lexical relation extraction, data augmentation, etc.
1 code implementation • 5 Jun 2022 • Daniil Moskovskiy, Daryna Dementieva, Alexander Panchenko
However, models are not able to perform cross-lingual detoxification and direct fine-tuning on exact language is inevitable.
no code implementations • ACL 2022 • Viktoriia Chekalina, Anton Razzhigaev, Albert Sayapin, Evgeny Frolov, Alexander Panchenko
Knowledge Graphs (KGs) are symbolically structured storages of facts.
2 code implementations • 19 Apr 2022 • Daryna Dementieva, Nikolay Babakov, Alexander Panchenko
Formality is one of the important characteristics of text documents.
no code implementations • 4 Mar 2022 • Nikolay Babakov, Varvara Logacheva, Alexander Panchenko
Toxicity on the Internet, such as hate speech, offenses towards particular users or groups of people, or the use of obscene words, is an acknowledged problem.
no code implementations • 21 Jan 2022 • Irina Nikishina, Mikhail Tikhomirov, Varvara Logacheva, Yuriy Nazarov, Alexander Panchenko, Natalia Loukachevitch
With the rapid growth of lexical resources for specific domains, the problem of automatic extension of the existing knowledge bases with new words is becoming more and more widespread.
1 code implementation • EMNLP 2021 • David Dale, Anton Voronov, Daryna Dementieva, Varvara Logacheva, Olga Kozlova, Nikita Semenov, Alexander Panchenko
We compare our models with a number of methods for style transfer.
1 code implementation • ACL 2021 • Daryna Dementieva, Alexander Panchenko
Misleading information spreads on the Internet at an incredible speed, which can lead to irreparable consequences in some cases.
no code implementations • SEMEVAL 2021 • David Dale, Igor Markov, Varvara Logacheva, Olga Kozlova, Nikita Semenov, Alexander Panchenko
We show that fine-tuning a RoBERTa model for this problem is a strong baseline.
no code implementations • SEMEVAL 2021 • Anton Razzhigaev, Nikolay Arefyev, Alexander Panchenko
In our experiments, we used a neural system based on the XLM-R, a pre-trained transformer-based masked language model, as a baseline.
3 code implementations • 19 May 2021 • Daryna Dementieva, Daniil Moskovskiy, Varvara Logacheva, David Dale, Olga Kozlova, Nikita Semenov, Alexander Panchenko
We introduce the first study of automatic detoxification of Russian texts to combat offensive language.
1 code implementation • EACL 2021 • Artem Shelmanov, Evgenii Tsymbalov, Dmitri Puzyrev, Kirill Fedyanin, Alexander Panchenko, Maxim Panov
In this work, we consider the problem of uncertainty estimation for Transformer-based models.
1 code implementation • 9 Mar 2021 • Nikolay Babakov, Varvara Logacheva, Olga Kozlova, Nikita Semenov, Alexander Panchenko
We define a set of sensitive topics that can yield inappropriate and toxic messages and describe the methodology of collecting and labeling a dataset for appropriateness.
no code implementations • EACL 2021 • Artem Shelmanov, Dmitri Puzyrev, Lyubov Kupriyanova, Denis Belyakov, Daniil Larionov, Nikita Khromov, Olga Kozlova, Ekaterina Artemova, Dmitry V. Dylov, Alexander Panchenko
Annotating training data for sequence tagging of texts is usually very time-consuming.
no code implementations • SEMEVAL 2020 • Daryna Dementieva, Igor Markov, Alexander Panchenko
This paper presents a solution for the Span Identification (SI) task in the {``}Detection of Propaganda Techniques in News Articles{''} competition at SemEval-2020.
1 code implementation • COLING 2020 • Irina Nikishina, Alexander Panchenko, Varvara Logacheva, Natalia Loukachevitch
Ontologies, taxonomies, and thesauri are used in many NLP tasks.
no code implementations • 31 May 2020 • Ozge Sevgili, Artem Shelmanov, Mikhail Arkhipov, Alexander Panchenko, Chris Biemann
This survey presents a comprehensive description of recent neural entity linking (EL) systems developed since 2015 as a result of the "deep learning revolution" in natural language processing.
no code implementations • 29 May 2020 • Nikolay Arefyev, Boris Sheludko, Alexander Podolskiy, Alexander Panchenko
Lexical substitution in context is an extremely powerful technology that can be used as a backbone of various NLP applications, such as word sense induction, lexical relation extraction, data augmentation, etc.
no code implementations • 22 May 2020 • Irina Nikishina, Varvara Logacheva, Alexander Panchenko, Natalia Loukachevitch
This paper describes the results of the first shared task on taxonomy enrichment for the Russian language.
no code implementations • LREC 2020 • Varvara Logacheva, Denis Teslenko, Artem Shelmanov, Steffen Remus, Dmitry Ustalov, Andrey Kutuzov, Ekaterina Artemova, Chris Biemann, Simone Paolo Ponzetto, Alexander Panchenko
We use this method to induce a collection of sense inventories for 158 languages on the basis of the original pre-trained fastText word embeddings by Grave et al. (2018), enabling WSD in these languages.
1 code implementation • ACL 2019 • Andrey Kutuzov, Mohammad Dorgham, Oleksiy Oliynyk, Chris Biemann, Alexander Panchenko
The computation of distance measures between nodes in graphs is inefficient and does not scale to large graphs.
no code implementations • 7 Jun 2019 • Abhik Jana, Dmitry Puzyrev, Alexander Panchenko, Pawan Goyal, Chris Biemann, Animesh Mukherjee
In particular, we use hypernymy information of the multiword and its constituents encoded in the form of the recently introduced Poincar\'e embeddings in addition to the distributional information to detect compositionality for noun phrases.
1 code implementation • ACL 2019 • Rami Aly, Shantanu Acharya, Alexander Ossa, Arne Köhn, Chris Biemann, Alexander Panchenko
We introduce the use of Poincar\'e embeddings to improve existing state-of-the-art approaches to domain-specific taxonomy induction from text as a signal for both relocating wrong hyponym terms within a (pre-induced) taxonomy as well as for attaching disconnected terms in a taxonomy.
1 code implementation • SEMEVAL 2019 • Saba Anwar, Dmitry Ustalov, Nikolay Arefyev, Simone Paolo Ponzetto, Chris Biemann, Alexander Panchenko
We present our system for semantic frame induction that showed the best performance in Subtask B. 1 and finished as the runner-up in Subtask A of the SemEval 2019 Task 2 on unsupervised semantic frame induction (QasemiZadeh et al., 2019).
no code implementations • 15 Jan 2019 • Matthias Schildwächter, Alexander Bondarenko, Julian Zenker, Matthias Hagen, Chris Biemann, Alexander Panchenko
We present CAM (comparative argumentative machine), a novel open-domain IR system to argumentatively compare objects with respect to information extracted from the Common Crawl.
3 code implementations • WS 2019 • Alexander Panchenko, Alexander Bondarenko, Mirco Franzek, Matthias Hagen, Chris Biemann
We tackle the tasks of automatically identifying comparative sentences and categorizing the intended preference (e. g., "Python has better NLP libraries than MATLAB" => (Python, better, MATLAB).
1 code implementation • 17 Sep 2018 • Dmitry Ustalov, Alexander Panchenko, Chris Biemann, Simone Paolo Ponzetto
In this paper, we show how unsupervised sense representations can be used to improve hypernymy extraction.
no code implementations • 23 Aug 2018 • Alexander Panchenko
A sentiment index measures the average emotional level in a corpus.
2 code implementations • CL 2019 • Dmitry Ustalov, Alexander Panchenko, Chris Biemann, Simone Paolo Ponzetto
We present a detailed theoretical and computational analysis of the Watset meta-algorithm for fuzzy graph clustering, which has been found to be widely applicable in a variety of domains.
no code implementations • SEMEVAL 2019 • Andrey Kutuzov, Mohammad Dorgham, Oleksiy Oliynyk, Chris Biemann, Alexander Panchenko
We present path2vec, a new approach for learning graph embeddings that relies on structural measures of pairwise node similarities.
no code implementations • 23 May 2018 • Nikolay Arefyev, Pavel Ermolaev, Alexander Panchenko
The paper describes our participation in the first shared task on word sense induction and disambiguation for the Russian language RUSSE'2018 (Panchenko et al., 2018).
1 code implementation • ACL 2018 • Dmitry Ustalov, Alexander Panchenko, Andrei Kutuzov, Chris Biemann, Simone Paolo Ponzetto
We use dependency triples automatically extracted from a Web-scale corpus to perform unsupervised semantic frame induction.
1 code implementation • LREC 2018 • Dmitry Ustalov, Denis Teslenko, Alexander Panchenko, Mikhail Chernoskutov, Chris Biemann, Simone Paolo Ponzetto
The sparse mode uses the traditional vector space model to estimate the most similar word sense corresponding to its context.
no code implementations • 13 Apr 2018 • Nikita Muravyev, Alexander Panchenko, Sergei Obiedkov
In this paper, we present a study of neologisms and loan words frequently occurring in Facebook user posts.
no code implementations • LREC 2018 • Stefano Faralli, Alexander Panchenko, Chris Biemann, Simone Paolo Ponzetto
We introduce a new lexical resource that enriches the Framester knowledge graph, which links Framnet, WordNet, VerbNet and other resources, with semantic features from text corpora.
no code implementations • 15 Mar 2018 • Alexander Panchenko, Natalia Loukachevitch, Dmitry Ustalov, Denis Paperno, Christian Meyer, Natalia Konstantinova
The paper gives an overview of the Russian Semantic Similarity Evaluation (RUSSE) shared task held in conjunction with the Dialogue 2015 conference.
no code implementations • 15 Mar 2018 • Alexander Panchenko, Anastasiya Lopukhina, Dmitry Ustalov, Konstantin Lopukhin, Nikolay Arefyev, Alexey Leontyev, Natalia Loukachevitch
The paper describes the results of the first shared task on word sense induction (WSI) for the Russian language.
no code implementations • 23 Dec 2017 • Chris Biemann, Stefano Faralli, Alexander Panchenko, Simone Paolo Ponzetto
While both kinds of semantic resources are available with high lexical coverage, our aligned resource combines the domain specificity and availability of contextual information from distributional models with the conciseness and high quality of manually crafted lexical networks.
1 code implementation • LREC 2018 • Alexander Panchenko, Dmitry Ustalov, Stefano Faralli, Simone P. Ponzetto, Chris Biemann
In this paper, we show how distributionally-induced semantic classes can be helpful for extracting hypernyms.
no code implementations • LREC 2018 • Alexander Panchenko, Eugen Ruppert, Stefano Faralli, Simone Paolo Ponzetto, Chris Biemann
We present DepCC, the largest-to-date linguistically analyzed corpus in English including 365 million documents, composed of 252 billion tokens and 7. 5 billion of named entity occurrences in 14. 3 billion sentences from a web-scale crawl of the \textsc{Common Crawl} project.
no code implementations • 31 Aug 2017 • Alexander Panchenko, Dmitry Ustalov, Nikolay Arefyev, Denis Paperno, Natalia Konstantinova, Natalia Loukachevitch, Chris Biemann
On the one hand, humans easily make judgments about semantic relatedness.
no code implementations • 30 Aug 2017 • Dmitry Ustalov, Mikhail Chernoskutov, Chris Biemann, Alexander Panchenko
Graph-based synset induction methods, such as MaxMax and Watset, induce synsets by performing a global clustering of a synonymy graph.
1 code implementation • WS 2016 • Maria Pelevina, Nikolay Arefyev, Chris Biemann, Alexander Panchenko
We present a simple yet effective approach for learning word sense embeddings.
1 code implementation • EMNLP 2017 • Alexander Panchenko, Fide Marten, Eugen Ruppert, Stefano Faralli, Dmitry Ustalov, Simone Paolo Ponzetto, Chris Biemann
In word sense disambiguation (WSD), knowledge-based systems tend to be much more interpretable than knowledge-free counterparts as they rely on the wealth of manually-encoded elements representing word senses, such as hypernyms, usage examples, and images.
1 code implementation • EACL 2017 • Dmitry Ustalov, Nikolay Arefyev, Chris Biemann, Alexander Panchenko
We present a new approach to extraction of hypernyms based on projection learning and word embeddings.
1 code implementation • ACL 2017 • Dmitry Ustalov, Alexander Panchenko, Chris Biemann
This paper presents a new graph-based approach that induces synsets using synonymy dictionaries and word embeddings.