Search Results for author: Alexander Panchenko

Found 78 papers, 36 papers with code

MERA: A Comprehensive LLM Evaluation in Russian

1 code implementation9 Jan 2024 Alena Fenogenova, Artem Chervyakov, Nikita Martynov, Anastasia Kozlova, Maria Tikhonova, Albina Akhmetgareeva, Anton Emelyanov, Denis Shevelev, Pavel Lebedev, Leonid Sinev, Ulyana Isaeva, Katerina Kolomeytseva, Daniil Moskovskiy, Elizaveta Goncharova, Nikita Savushkin, Polina Mikhailova, Denis Dimitrov, Alexander Panchenko, Sergei Markov

To address these issues, we introduce an open Multimodal Evaluation of Russian-language Architectures (MERA), a new instruction benchmark for evaluating foundation models oriented towards the Russian language.

Uncertainty Estimation of Transformer Predictions for Misclassification Detection

1 code implementation ACL 2022 Artem Vazhentsev, Gleb Kuzmin, Artem Shelmanov, Akim Tsvigun, Evgenii Tsymbalov, Kirill Fedyanin, Maxim Panov, Alexander Panchenko, Gleb Gusev, Mikhail Burtsev, Manvel Avetisian, Leonid Zhukov

Uncertainty estimation (UE) of model predictions is a crucial step for a variety of tasks such as active learning, misclassification detection, adversarial attack detection, out-of-distribution detection, etc.

Active Learning Adversarial Attack Detection +7

Detecting Inappropriate Messages on Sensitive Topics that Could Harm a Company's Reputation

1 code implementation9 Mar 2021 Nikolay Babakov, Varvara Logacheva, Olga Kozlova, Nikita Semenov, Alexander Panchenko

We define a set of sensitive topics that can yield inappropriate and toxic messages and describe the methodology of collecting and labeling a dataset for appropriateness.

Watset: Local-Global Graph Clustering with Applications in Sense and Frame Induction

2 code implementations CL 2019 Dmitry Ustalov, Alexander Panchenko, Chris Biemann, Simone Paolo Ponzetto

We present a detailed theoretical and computational analysis of the Watset meta-algorithm for fuzzy graph clustering, which has been found to be widely applicable in a variety of domains.

Clustering Graph Clustering

Every child should have parents: a taxonomy refinement algorithm based on hyperbolic term embeddings

1 code implementation ACL 2019 Rami Aly, Shantanu Acharya, Alexander Ossa, Arne Köhn, Chris Biemann, Alexander Panchenko

We introduce the use of Poincar\'e embeddings to improve existing state-of-the-art approaches to domain-specific taxonomy induction from text as a signal for both relocating wrong hyponym terms within a (pre-induced) taxonomy as well as for attaching disconnected terms in a taxonomy.

ParaDetox: Detoxification with Parallel Data

1 code implementation ACL 2022 Varvara Logacheva, Daryna Dementieva, Sergey Ustyantsev, Daniil Moskovskiy, David Dale, Irina Krotova, Nikita Semenov, Alexander Panchenko

To the best of our knowledge, these are the first parallel datasets for this task. We describe our pipeline in detail to make it fast to set up for a new language or domain, thus contributing to faster and easier development of new parallel resources. We train several detoxification models on the collected data and compare them with several baselines and state-of-the-art unsupervised approaches.

Sentence

Unsupervised, Knowledge-Free, and Interpretable Word Sense Disambiguation

1 code implementation EMNLP 2017 Alexander Panchenko, Fide Marten, Eugen Ruppert, Stefano Faralli, Dmitry Ustalov, Simone Paolo Ponzetto, Chris Biemann

In word sense disambiguation (WSD), knowledge-based systems tend to be much more interpretable than knowledge-free counterparts as they rely on the wealth of manually-encoded elements representing word senses, such as hypernyms, usage examples, and images.

Word Sense Disambiguation

Watset: Automatic Induction of Synsets from a Graph of Synonyms

1 code implementation ACL 2017 Dmitry Ustalov, Alexander Panchenko, Chris Biemann

This paper presents a new graph-based approach that induces synsets using synonymy dictionaries and word embeddings.

Clustering Word Embeddings +1

Cross-lingual Evidence Improves Monolingual Fake News Detection

1 code implementation ACL 2021 Daryna Dementieva, Alexander Panchenko

Misleading information spreads on the Internet at an incredible speed, which can lead to irreparable consequences in some cases.

Fake News Detection News Classification

Multiverse: Multilingual Evidence for Fake News Detection

1 code implementation25 Nov 2022 Daryna Dementieva, Mikhail Kuimov, Alexander Panchenko

In this work, we propose Multiverse -- a new feature based on multilingual evidence that can be used for fake news detection and improve existing approaches.

Fake News Detection News Classification

Unsupervised Semantic Frame Induction using Triclustering

1 code implementation ACL 2018 Dmitry Ustalov, Alexander Panchenko, Andrei Kutuzov, Chris Biemann, Simone Paolo Ponzetto

We use dependency triples automatically extracted from a Web-scale corpus to perform unsupervised semantic frame induction.

Clustering

Categorizing Comparative Sentences

3 code implementations WS 2019 Alexander Panchenko, Alexander Bondarenko, Mirco Franzek, Matthias Hagen, Chris Biemann

We tackle the tasks of automatically identifying comparative sentences and categorizing the intended preference (e. g., "Python has better NLP libraries than MATLAB" => (Python, better, MATLAB).

Argument Mining Sentence +1

Unsupervised Sense-Aware Hypernymy Extraction

1 code implementation17 Sep 2018 Dmitry Ustalov, Alexander Panchenko, Chris Biemann, Simone Paolo Ponzetto

In this paper, we show how unsupervised sense representations can be used to improve hypernymy extraction.

Exploring Cross-lingual Text Detoxification with Large Multilingual Language Models.

1 code implementation ACL 2022 Daniil Moskovskiy, Daryna Dementieva, Alexander Panchenko

This work investigates multilingual and cross-lingual detoxification and the behavior of large multilingual models in this setting.

Style Transfer

Exploring Cross-lingual Textual Style Transfer with Large Multilingual Language Models

1 code implementation5 Jun 2022 Daniil Moskovskiy, Daryna Dementieva, Alexander Panchenko

However, models are not able to perform cross-lingual detoxification and direct fine-tuning on exact language is inevitable.

Style Transfer

TaxoLLaMA: WordNet-based Model for Solving Multiple Lexical Sematic Tasks

1 code implementation14 Mar 2024 Viktor Moskvoretskii, Ekaterina Neminova, Alina Lobanova, Alexander Panchenko, Irina Nikishina

It achieves 11 SotA results, 4 top-2 results out of 16 tasks for the Taxonomy Enrichment, Hypernym Discovery, Taxonomy Construction, and Lexical Entailment tasks.

Domain Adaptation Few-Shot Learning +3

Studying the role of named entities for content preservation in text style transfer

2 code implementations20 Jun 2022 Nikolay Babakov, David Dale, Varvara Logacheva, Irina Krotova, Alexander Panchenko

Text style transfer techniques are gaining popularity in Natural Language Processing, finding various applications such as text detoxification, sentiment, or formality transfer.

Style Transfer Text Style Transfer

Don't lose the message while paraphrasing: A study on content preserving style transfer

1 code implementation17 Aug 2023 Nikolay Babakov, David Dale, Ilya Gusev, Irina Krotova, Alexander Panchenko

Text style transfer techniques are gaining popularity in natural language processing allowing paraphrasing text in the required form: from toxic to neural, from formal to informal, from old to the modern English language, etc.

Style Transfer Text Style Transfer

How much does a word weigh? Weighting word embeddings for word sense induction

no code implementations23 May 2018 Nikolay Arefyev, Pavel Ermolaev, Alexander Panchenko

The paper describes our participation in the first shared task on word sense induction and disambiguation for the Russian language RUSSE'2018 (Panchenko et al., 2018).

Clustering Machine Translation +3

Neologisms on Facebook

no code implementations13 Apr 2018 Nikita Muravyev, Alexander Panchenko, Sergei Obiedkov

In this paper, we present a study of neologisms and loan words frequently occurring in Facebook user posts.

Marketing

Enriching Frame Representations with Distributionally Induced Senses

no code implementations LREC 2018 Stefano Faralli, Alexander Panchenko, Chris Biemann, Simone Paolo Ponzetto

We introduce a new lexical resource that enriches the Framester knowledge graph, which links Framnet, WordNet, VerbNet and other resources, with semantic features from text corpora.

RUSSE: The First Workshop on Russian Semantic Similarity

no code implementations15 Mar 2018 Alexander Panchenko, Natalia Loukachevitch, Dmitry Ustalov, Denis Paperno, Christian Meyer, Natalia Konstantinova

The paper gives an overview of the Russian Semantic Similarity Evaluation (RUSSE) shared task held in conjunction with the Dialogue 2015 conference.

Semantic Similarity Semantic Textual Similarity

Building a Web-Scale Dependency-Parsed Corpus from CommonCrawl

no code implementations LREC 2018 Alexander Panchenko, Eugen Ruppert, Stefano Faralli, Simone Paolo Ponzetto, Chris Biemann

We present DepCC, the largest-to-date linguistically analyzed corpus in English including 365 million documents, composed of 252 billion tokens and 7. 5 billion of named entity occurrences in 14. 3 billion sentences from a web-scale crawl of the \textsc{Common Crawl} project.

Open Information Extraction Question Answering +1

A Framework for Enriching Lexical Semantic Resources with Distributional Semantics

no code implementations23 Dec 2017 Chris Biemann, Stefano Faralli, Alexander Panchenko, Simone Paolo Ponzetto

While both kinds of semantic resources are available with high lexical coverage, our aligned resource combines the domain specificity and availability of contextual information from distributional models with the conciseness and high quality of manually crafted lexical networks.

Specificity Word Sense Disambiguation

Fighting with the Sparsity of Synonymy Dictionaries

no code implementations30 Aug 2017 Dmitry Ustalov, Mikhail Chernoskutov, Chris Biemann, Alexander Panchenko

Graph-based synset induction methods, such as MaxMax and Watset, induce synsets by performing a global clustering of a synonymy graph.

Clustering

Sentiment Index of the Russian Speaking Facebook

no code implementations23 Aug 2018 Alexander Panchenko

A sentiment index measures the average emotional level in a corpus.

Answering Comparative Questions: Better than Ten-Blue-Links?

no code implementations15 Jan 2019 Matthias Schildwächter, Alexander Bondarenko, Julian Zenker, Matthias Hagen, Chris Biemann, Alexander Panchenko

We present CAM (comparative argumentative machine), a novel open-domain IR system to argumentatively compare objects with respect to information extracted from the Common Crawl.

HHMM at SemEval-2019 Task 2: Unsupervised Frame Induction using Contextualized Word Embeddings

1 code implementation SEMEVAL 2019 Saba Anwar, Dmitry Ustalov, Nikolay Arefyev, Simone Paolo Ponzetto, Chris Biemann, Alexander Panchenko

We present our system for semantic frame induction that showed the best performance in Subtask B. 1 and finished as the runner-up in Subtask A of the SemEval 2019 Task 2 on unsupervised semantic frame induction (QasemiZadeh et al., 2019).

Clustering Task 2 +1

On the Compositionality Prediction of Noun Phrases using Poincaré Embeddings

no code implementations7 Jun 2019 Abhik Jana, Dmitry Puzyrev, Alexander Panchenko, Pawan Goyal, Chris Biemann, Animesh Mukherjee

In particular, we use hypernymy information of the multiword and its constituents encoded in the form of the recently introduced Poincar\'e embeddings in addition to the distributional information to detect compositionality for noun phrases.

Word Sense Disambiguation for 158 Languages using Word Embeddings Only

no code implementations LREC 2020 Varvara Logacheva, Denis Teslenko, Artem Shelmanov, Steffen Remus, Dmitry Ustalov, Andrey Kutuzov, Ekaterina Artemova, Chris Biemann, Simone Paolo Ponzetto, Alexander Panchenko

We use this method to induce a collection of sense inventories for 158 languages on the basis of the original pre-trained fastText word embeddings by Grave et al. (2018), enabling WSD in these languages.

Word Embeddings Word Sense Disambiguation

RUSSE'2020: Findings of the First Taxonomy Enrichment Task for the Russian language

no code implementations22 May 2020 Irina Nikishina, Varvara Logacheva, Alexander Panchenko, Natalia Loukachevitch

This paper describes the results of the first shared task on taxonomy enrichment for the Russian language.

A Comparative Study of Lexical Substitution Approaches based on Neural Language Models

no code implementations29 May 2020 Nikolay Arefyev, Boris Sheludko, Alexander Podolskiy, Alexander Panchenko

Lexical substitution in context is an extremely powerful technology that can be used as a backbone of various NLP applications, such as word sense induction, lexical relation extraction, data augmentation, etc.

Data Augmentation Relation Extraction +1

Neural Entity Linking: A Survey of Models Based on Deep Learning

no code implementations31 May 2020 Ozge Sevgili, Artem Shelmanov, Mikhail Arkhipov, Alexander Panchenko, Chris Biemann

This survey presents a comprehensive description of recent neural entity linking (EL) systems developed since 2015 as a result of the "deep learning revolution" in natural language processing.

Entity Embeddings Entity Linking

SkoltechNLP at SemEval-2020 Task 11: Exploring Unsupervised Text Augmentation for Propaganda Detection

no code implementations SEMEVAL 2020 Daryna Dementieva, Igor Markov, Alexander Panchenko

This paper presents a solution for the Span Identification (SI) task in the {``}Detection of Propaganda Techniques in News Articles{''} competition at SemEval-2020.

Propaganda detection Text Augmentation

Evaluation of Taxonomy Enrichment on Diachronic WordNet Versions

no code implementations EACL (GWC) 2021 Irina Nikishina, Natalia Loukachevitch, Varvara Logacheva, Alexander Panchenko

The vast majority of the existing approaches for taxonomy enrichment apply word embeddings as they have proven to accumulate contexts (in a broad sense) extracted from texts which are sufficient for attaching orphan words to the taxonomy.

Word Embeddings

Detecting Inappropriate Messages on Sensitive Topics that Could Harm a Company’s Reputation

no code implementations EACL (BSNLP) 2021 Nikolay Babakov, Varvara Logacheva, Olga Kozlova, Nikita Semenov, Alexander Panchenko

We define a set of sensitive topics that can yield inappropriate and toxic messages and describe the methodology of collecting and labelling a dataset for appropriateness.

Generating Lexical Representations of Frames using Lexical Substitution

no code implementations PaM 2020 Saba Anwar, Artem Shelmanov, Alexander Panchenko, Chris Biemann

We investigate a simple yet effective method, lexical substitution with word representation models, to automatically expand a small set of frame-annotated sentences with new words for their respective roles and LUs.

Taxonomy Enrichment with Text and Graph Vector Representations

no code implementations21 Jan 2022 Irina Nikishina, Mikhail Tikhomirov, Varvara Logacheva, Yuriy Nazarov, Alexander Panchenko, Natalia Loukachevitch

With the rapid growth of lexical resources for specific domains, the problem of automatic extension of the existing knowledge bases with new words is becoming more and more widespread.

Knowledge Graphs Word Embeddings

Beyond Plain Toxic: Detection of Inappropriate Statements on Flammable Topics for the Russian Language

no code implementations4 Mar 2022 Nikolay Babakov, Varvara Logacheva, Alexander Panchenko

Toxicity on the Internet, such as hate speech, offenses towards particular users or groups of people, or the use of obscene words, is an acknowledged problem.

Chatbot Cultural Vocal Bursts Intensity Prediction

RuArg-2022: Argument Mining Evaluation

no code implementations18 Jun 2022 Evgeny Kotelnikov, Natalia Loukachevitch, Irina Nikishina, Alexander Panchenko

Argumentation analysis is a field of computational linguistics that studies methods for extracting arguments from texts and the relationships between them, as well as building argumentation structure of texts.

Argument Mining Natural Language Inference +1

Always Keep your Target in Mind: Studying Semantics and Improving Performance of Neural Lexical Substitution

1 code implementation COLING 2020 Nikolay Arefyev, Boris Sheludko, Alexander Podolskiy, Alexander Panchenko

Lexical substitution, i. e. generation of plausible words that can replace a particular target word in a given context, is an extremely powerful technology that can be used as a backbone of various NLP applications, including word sense induction and disambiguation, lexical relation extraction, data augmentation, etc.

Data Augmentation Relation Extraction +1

RuPAWS: A Russian Adversarial Dataset for Paraphrase Identification

1 code implementation LREC 2022 Nikita Martynov, Irina Krotova, Varvara Logacheva, Alexander Panchenko, Olga Kozlova, Nikita Semenov

We compare it to the largest available dataset for Russian ParaPhraser and show that the best available paraphrase identifiers for the Russian language fail on the RuPAWS dataset.

Paraphrase Identification

Pixel-Level BPE for Auto-Regressive Image Generation

no code implementations MMMPIE (COLING) 2022 Anton Razzhigaev, Anton Voronov, Andrey Kaznacheev, Andrey Kuznetsov, Denis Dimitrov, Alexander Panchenko

Pixel-level autoregression with Transformer models (Image GPT or iGPT) is one of the recent approaches to image generation that has not received massive attention and elaboration due to quadratic complexity of attention as it imposes huge memory requirements and thus restricts the resolution of the generated images.

Image Generation

Error syntax aware augmentation of feedback comment generation dataset

no code implementations29 Dec 2022 Nikolay Babakov, Maria Lysyuk, Alexander Shvets, Lilya Kazakova, Alexander Panchenko

This paper presents a solution to the GenChal 2022 shared task dedicated to feedback comment generation for writing learning.

Comment Generation

Efficient GPT Model Pre-training using Tensor Train Matrix Representation

no code implementations5 Jun 2023 Viktoriia Chekalina, Georgii Novikov, Julia Gusak, Ivan Oseledets, Alexander Panchenko

On the downstream tasks, including language understanding and text summarization, the model performs similarly to the original GPT-2 model.

Language Modelling Text Summarization

Large Language Models Meet Knowledge Graphs to Answer Factoid Questions

no code implementations3 Oct 2023 Mikhail Salnikov, Hai Le, Prateek Rajput, Irina Nikishina, Pavel Braslavski, Valentin Malykh, Alexander Panchenko

Recently, it has been shown that the incorporation of structured knowledge into Large Language Models significantly improves the results for a variety of NLP tasks.

Knowledge Graphs Re-Ranking

LM-Polygraph: Uncertainty Estimation for Language Models

no code implementations13 Nov 2023 Ekaterina Fadeeva, Roman Vashurin, Akim Tsvigun, Artem Vazhentsev, Sergey Petrakov, Kirill Fedyanin, Daniil Vasilev, Elizaveta Goncharova, Alexander Panchenko, Maxim Panov, Timothy Baldwin, Artem Shelmanov

Recent advancements in the capabilities of large language models (LLMs) have paved the way for a myriad of groundbreaking applications in various fields.

Text Generation

Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification

no code implementations7 Mar 2024 Ekaterina Fadeeva, Aleksandr Rubashevskii, Artem Shelmanov, Sergey Petrakov, Haonan Li, Hamdy Mubarak, Evgenii Tsymbalov, Gleb Kuzmin, Alexander Panchenko, Timothy Baldwin, Preslav Nakov, Maxim Panov

Uncertainty scores leverage information encapsulated in the output of a neural network or its layers to detect unreliable predictions, and we show that they can be used to fact-check the atomic claims in the LLM output.

Fact Checking Hallucination +1

MultiParaDetox: Extending Text Detoxification with Parallel Data to New Languages

no code implementations2 Apr 2024 Daryna Dementieva, Nikolay Babakov, Alexander Panchenko

Text detoxification is a textual style transfer (TST) task where a text is paraphrased from a toxic surface form, e. g. featuring rude words, to the neutral register.

Style Transfer

Cannot find the paper you are looking for? You can Submit a new open access paper.