Search Results for author: Alexander Panchenko

Found 55 papers, 23 papers with code

Uncertainty Estimation of Transformer Predictions for Misclassification Detection

1 code implementation ACL 2022 Artem Vazhentsev, Gleb Kuzmin, Artem Shelmanov, Akim Tsvigun, Evgenii Tsymbalov, Kirill Fedyanin, Maxim Panov, Alexander Panchenko, Gleb Gusev, Mikhail Burtsev, Manvel Avetisian, Leonid Zhukov

Uncertainty estimation (UE) of model predictions is a crucial step for a variety of tasks such as active learning, misclassification detection, adversarial attack detection, out-of-distribution detection, etc.

Active Learning Adversarial Attack Detection +5

ParaDetox: Detoxification with Parallel Data

1 code implementation ACL 2022 Varvara Logacheva, Daryna Dementieva, Sergey Ustyantsev, Daniil Moskovskiy, David Dale, Irina Krotova, Nikita Semenov, Alexander Panchenko

To the best of our knowledge, these are the first parallel datasets for this task. We describe our pipeline in detail to make it fast to set up for a new language or domain, thus contributing to faster and easier development of new parallel resources. We train several detoxification models on the collected data and compare them with several baselines and state-of-the-art unsupervised approaches.

Generating Lexical Representations of Frames using Lexical Substitution

no code implementations PaM 2020 Saba Anwar, Artem Shelmanov, Alexander Panchenko, Chris Biemann

We investigate a simple yet effective method, lexical substitution with word representation models, to automatically expand a small set of frame-annotated sentences with new words for their respective roles and LUs.

Frame

Detecting Inappropriate Messages on Sensitive Topics that Could Harm a Company’s Reputation

no code implementations EACL (BSNLP) 2021 Nikolay Babakov, Varvara Logacheva, Olga Kozlova, Nikita Semenov, Alexander Panchenko

We define a set of sensitive topics that can yield inappropriate and toxic messages and describe the methodology of collecting and labelling a dataset for appropriateness.

Exploring Cross-lingual Text Detoxification with Large Multilingual Language Models.

no code implementations ACL 2022 Daniil Moskovskiy, Daryna Dementieva, Alexander Panchenko

This work investigates multilingual and cross-lingual detoxification and the behavior of large multilingual models in this setting.

Style Transfer

Evaluation of Taxonomy Enrichment on Diachronic WordNet Versions

no code implementations EACL (GWC) 2021 Irina Nikishina, Natalia Loukachevitch, Varvara Logacheva, Alexander Panchenko

The vast majority of the existing approaches for taxonomy enrichment apply word embeddings as they have proven to accumulate contexts (in a broad sense) extracted from texts which are sufficient for attaching orphan words to the taxonomy.

Word Embeddings

Beyond Plain Toxic: Detection of Inappropriate Statements on Flammable Topics for the Russian Language

no code implementations4 Mar 2022 Nikolay Babakov, Varvara Logacheva, Alexander Panchenko

Toxicity on the Internet, such as hate speech, offenses towards particular users or groups of people, or the use of obscene words, is an acknowledged problem.

Chatbot

Taxonomy Enrichment with Text and Graph Vector Representations

no code implementations21 Jan 2022 Irina Nikishina, Mikhail Tikhomirov, Varvara Logacheva, Yuriy Nazarov, Alexander Panchenko, Natalia Loukachevitch

With the rapid growth of lexical resources for specific domains, the problem of automatic extension of the existing knowledge bases with new words is becoming more and more widespread.

Knowledge Graphs Word Embeddings

Cross-lingual Evidence Improves Monolingual Fake News Detection

no code implementations ACL 2021 Daryna Dementieva, Alexander Panchenko

Misleading information spreads on the Internet at an incredible speed, which can lead to irreparable consequences in some cases.

Fake News Detection News Classification

Detecting Inappropriate Messages on Sensitive Topics that Could Harm a Company's Reputation

1 code implementation9 Mar 2021 Nikolay Babakov, Varvara Logacheva, Olga Kozlova, Nikita Semenov, Alexander Panchenko

We define a set of sensitive topics that can yield inappropriate and toxic messages and describe the methodology of collecting and labeling a dataset for appropriateness.

Always Keep your Target in Mind: Studying Semantics and Improving Performance of Neural Lexical Substitution

1 code implementation COLING 2020 Nikolay Arefyev, Boris Sheludko, Alexander Podolskiy, Alexander Panchenko

Lexical substitution, i. e. generation of plausible words that can replace a particular target word in a given context, is an extremely powerful technology that can be used as a backbone of various NLP applications, including word sense induction and disambiguation, lexical relation extraction, data augmentation, etc.

Data Augmentation Relation Extraction +1

SkoltechNLP at SemEval-2020 Task 11: Exploring Unsupervised Text Augmentation for Propaganda Detection

no code implementations SEMEVAL 2020 Daryna Dementieva, Igor Markov, Alexander Panchenko

This paper presents a solution for the Span Identification (SI) task in the {``}Detection of Propaganda Techniques in News Articles{''} competition at SemEval-2020.

Propaganda detection Text Augmentation

Neural Entity Linking: A Survey of Models Based on Deep Learning

no code implementations31 May 2020 Ozge Sevgili, Artem Shelmanov, Mikhail Arkhipov, Alexander Panchenko, Chris Biemann

This survey presents a comprehensive description of recent neural entity linking (EL) systems developed since 2015 as a result of the "deep learning revolution" in natural language processing.

Entity Embeddings Entity Linking

A Comparative Study of Lexical Substitution Approaches based on Neural Language Models

no code implementations29 May 2020 Nikolay Arefyev, Boris Sheludko, Alexander Podolskiy, Alexander Panchenko

Lexical substitution in context is an extremely powerful technology that can be used as a backbone of various NLP applications, such as word sense induction, lexical relation extraction, data augmentation, etc.

Data Augmentation Relation Extraction +1

RUSSE'2020: Findings of the First Taxonomy Enrichment Task for the Russian language

no code implementations22 May 2020 Irina Nikishina, Varvara Logacheva, Alexander Panchenko, Natalia Loukachevitch

This paper describes the results of the first shared task on taxonomy enrichment for the Russian language.

Word Sense Disambiguation for 158 Languages using Word Embeddings Only

no code implementations LREC 2020 Varvara Logacheva, Denis Teslenko, Artem Shelmanov, Steffen Remus, Dmitry Ustalov, Andrey Kutuzov, Ekaterina Artemova, Chris Biemann, Simone Paolo Ponzetto, Alexander Panchenko

We use this method to induce a collection of sense inventories for 158 languages on the basis of the original pre-trained fastText word embeddings by Grave et al. (2018), enabling WSD in these languages.

Word Embeddings Word Sense Disambiguation

On the Compositionality Prediction of Noun Phrases using Poincaré Embeddings

no code implementations7 Jun 2019 Abhik Jana, Dmitry Puzyrev, Alexander Panchenko, Pawan Goyal, Chris Biemann, Animesh Mukherjee

In particular, we use hypernymy information of the multiword and its constituents encoded in the form of the recently introduced Poincar\'e embeddings in addition to the distributional information to detect compositionality for noun phrases.

Every child should have parents: a taxonomy refinement algorithm based on hyperbolic term embeddings

1 code implementation ACL 2019 Rami Aly, Shantanu Acharya, Alexander Ossa, Arne Köhn, Chris Biemann, Alexander Panchenko

We introduce the use of Poincar\'e embeddings to improve existing state-of-the-art approaches to domain-specific taxonomy induction from text as a signal for both relocating wrong hyponym terms within a (pre-induced) taxonomy as well as for attaching disconnected terms in a taxonomy.

HHMM at SemEval-2019 Task 2: Unsupervised Frame Induction using Contextualized Word Embeddings

1 code implementation SEMEVAL 2019 Saba Anwar, Dmitry Ustalov, Nikolay Arefyev, Simone Paolo Ponzetto, Chris Biemann, Alexander Panchenko

We present our system for semantic frame induction that showed the best performance in Subtask B. 1 and finished as the runner-up in Subtask A of the SemEval 2019 Task 2 on unsupervised semantic frame induction (QasemiZadeh et al., 2019).

Frame Word Embeddings

Answering Comparative Questions: Better than Ten-Blue-Links?

no code implementations15 Jan 2019 Matthias Schildwächter, Alexander Bondarenko, Julian Zenker, Matthias Hagen, Chris Biemann, Alexander Panchenko

We present CAM (comparative argumentative machine), a novel open-domain IR system to argumentatively compare objects with respect to information extracted from the Common Crawl.

Unsupervised Sense-Aware Hypernymy Extraction

1 code implementation17 Sep 2018 Dmitry Ustalov, Alexander Panchenko, Chris Biemann, Simone Paolo Ponzetto

In this paper, we show how unsupervised sense representations can be used to improve hypernymy extraction.

Categorizing Comparative Sentences

3 code implementations WS 2019 Alexander Panchenko, Alexander Bondarenko, Mirco Franzek, Matthias Hagen, Chris Biemann

We tackle the tasks of automatically identifying comparative sentences and categorizing the intended preference (e. g., "Python has better NLP libraries than MATLAB" => (Python, better, MATLAB).

Argument Mining Sentence Embeddings

Sentiment Index of the Russian Speaking Facebook

no code implementations23 Aug 2018 Alexander Panchenko

A sentiment index measures the average emotional level in a corpus.

Watset: Local-Global Graph Clustering with Applications in Sense and Frame Induction

2 code implementations CL 2019 Dmitry Ustalov, Alexander Panchenko, Chris Biemann, Simone Paolo Ponzetto

We present a detailed theoretical and computational analysis of the Watset meta-algorithm for fuzzy graph clustering, which has been found to be widely applicable in a variety of domains.

Frame Graph Clustering

How much does a word weigh? Weighting word embeddings for word sense induction

no code implementations23 May 2018 Nikolay Arefyev, Pavel Ermolaev, Alexander Panchenko

The paper describes our participation in the first shared task on word sense induction and disambiguation for the Russian language RUSSE'2018 (Panchenko et al., 2018).

Machine Translation Translation +2

Unsupervised Semantic Frame Induction using Triclustering

1 code implementation ACL 2018 Dmitry Ustalov, Alexander Panchenko, Andrei Kutuzov, Chris Biemann, Simone Paolo Ponzetto

We use dependency triples automatically extracted from a Web-scale corpus to perform unsupervised semantic frame induction.

Frame

Neologisms on Facebook

no code implementations13 Apr 2018 Nikita Muravyev, Alexander Panchenko, Sergei Obiedkov

In this paper, we present a study of neologisms and loan words frequently occurring in Facebook user posts.

RUSSE: The First Workshop on Russian Semantic Similarity

no code implementations15 Mar 2018 Alexander Panchenko, Natalia Loukachevitch, Dmitry Ustalov, Denis Paperno, Christian Meyer, Natalia Konstantinova

The paper gives an overview of the Russian Semantic Similarity Evaluation (RUSSE) shared task held in conjunction with the Dialogue 2015 conference.

Semantic Similarity Semantic Textual Similarity

Enriching Frame Representations with Distributionally Induced Senses

no code implementations LREC 2018 Stefano Faralli, Alexander Panchenko, Chris Biemann, Simone Paolo Ponzetto

We introduce a new lexical resource that enriches the Framester knowledge graph, which links Framnet, WordNet, VerbNet and other resources, with semantic features from text corpora.

Frame

A Framework for Enriching Lexical Semantic Resources with Distributional Semantics

no code implementations23 Dec 2017 Chris Biemann, Stefano Faralli, Alexander Panchenko, Simone Paolo Ponzetto

While both kinds of semantic resources are available with high lexical coverage, our aligned resource combines the domain specificity and availability of contextual information from distributional models with the conciseness and high quality of manually crafted lexical networks.

Word Sense Disambiguation

Building a Web-Scale Dependency-Parsed Corpus from CommonCrawl

no code implementations LREC 2018 Alexander Panchenko, Eugen Ruppert, Stefano Faralli, Simone Paolo Ponzetto, Chris Biemann

We present DepCC, the largest-to-date linguistically analyzed corpus in English including 365 million documents, composed of 252 billion tokens and 7. 5 billion of named entity occurrences in 14. 3 billion sentences from a web-scale crawl of the \textsc{Common Crawl} project.

Open Information Extraction Question Answering +1

Fighting with the Sparsity of Synonymy Dictionaries

no code implementations30 Aug 2017 Dmitry Ustalov, Mikhail Chernoskutov, Chris Biemann, Alexander Panchenko

Graph-based synset induction methods, such as MaxMax and Watset, induce synsets by performing a global clustering of a synonymy graph.

Unsupervised, Knowledge-Free, and Interpretable Word Sense Disambiguation

1 code implementation EMNLP 2017 Alexander Panchenko, Fide Marten, Eugen Ruppert, Stefano Faralli, Dmitry Ustalov, Simone Paolo Ponzetto, Chris Biemann

In word sense disambiguation (WSD), knowledge-based systems tend to be much more interpretable than knowledge-free counterparts as they rely on the wealth of manually-encoded elements representing word senses, such as hypernyms, usage examples, and images.

Word Sense Disambiguation

Watset: Automatic Induction of Synsets from a Graph of Synonyms

1 code implementation ACL 2017 Dmitry Ustalov, Alexander Panchenko, Chris Biemann

This paper presents a new graph-based approach that induces synsets using synonymy dictionaries and word embeddings.

Word Embeddings Word Sense Induction

Cannot find the paper you are looking for? You can Submit a new open access paper.