no code implementations • CLIB 2022 • Timofey Atnashev, Veronika Ganeeva, Roman Kazakov, Daria Matyash, Michael Sonkin, Ekaterina Voloshina, Oleg Serikov, Ekaterina Artemova
The labelling is carried out on the crowdsourcing platfrom Yandex. Toloka in two stages.
no code implementations • COLING (WNUT) 2022 • Evgeny Orlov, Ekaterina Artemova
Code-switching (CS) is a phenomenon of mixing words and phrases from multiple languages within a single sentence or conversation.
no code implementations • INLG (ACL) 2021 • Pavel Burnyshev, Valentin Malykh, Andrey Bout, Ekaterina Artemova, Irina Piontkovskaya
We explore two approaches to the generation of task-oriented utterances: in the zero-shot approach, the model is trained to generate utterances from seen intents and is further used to generate utterances for intents unseen during training.
1 code implementation • 4 Dec 2024 • Konstantin Chernyshev, Vitaliy Polshkov, Ekaterina Artemova, Alex Myasnikov, Vlad Stepanov, Alexei Miasnikov, Sergei Tilga
Given the open-ended nature of U-MATH problems, we employ an LLM to judge the correctness of generated solutions.
no code implementations • 7 Nov 2024 • Ekaterina Artemova, Akim Tsvigun, Dominik Schlechtweg, Natalia Fedorova, Sergei Tilga, Konstantin Chernyshev, Boris Obmoroshev
Training and deploying machine learning models relies on a large amount of human-annotated data.
no code implementations • 6 Nov 2024 • Ekaterina Artemova, Jason Lucas, Saranya Venkatraman, Jooyoung Lee, Sergei Tilga, Adaku Uchendu, Vladislav Mikhailov
The rapid proliferation of large language models (LLMs) has increased the volume of machine-generated texts (MGTs) and blurred text authorship in various domains.
1 code implementation • 8 Aug 2024 • Mervat Abassy, Kareem Elozeiri, Alexander Aziz, Minh Ngoc Ta, Raj Vardhan Tomar, Bimarsha Adhikari, Saad El Dine Ahmed, Yuxia Wang, Osama Mohammed Afzal, Zhuohan Xie, Jonibek Mansurov, Ekaterina Artemova, Vladislav Mikhailov, Rui Xing, Jiahui Geng, Hasan Iqbal, Zain Muhammad Mujahid, Tarek Mahmoud, Akim Tsvigun, Alham Fikri Aji, Artem Shelmanov, Nizar Habash, Iryna Gurevych, Preslav Nakov
Category (iii) aims to detect attempts to obfuscate the fact that a text was machine-generated, while category (iv) looks for cases where the LLM was used to polish a human-written text, which is typically acceptable in academic writing, but not in education.
no code implementations • 24 Jul 2024 • Nikita Andreev, Alexander Shirnin, Vladislav Mikhailov, Ekaterina Artemova
This paper presents Papilusion, an AI-generated scientific text detector developed within the DAGPap24 shared task on detecting automatically generated scientific papers.
1 code implementation • 27 Jun 2024 • Ekaterina Taktasheva, Maxim Bazhukov, Kirill Koncha, Alena Fenogenova, Ekaterina Artemova, Vladislav Mikhailov
Minimal pairs are a well-established approach to evaluating the grammatical knowledge of language models.
1 code implementation • 28 Mar 2024 • Alexander Shirnin, Nikita Andreev, Vladislav Mikhailov, Ekaterina Artemova
This paper describes AIpom, a system designed to detect a boundary between human-written and machine-generated text (SemEval-2024 Task 8, Subtask C: Human-Machine Mixed Text Detection).
1 code implementation • 26 Mar 2024 • Veronika Grigoreva, Anastasiia Ivanova, Ilseyar Alimova, Ekaterina Artemova
To illustrate the dataset's purpose, we conduct a diagnostic evaluation of state-of-the-art or near-state-of-the-art LLMs and discuss the LLMs' predisposition to social biases.
1 code implementation • 19 Mar 2024 • Siyao Peng, Zihang Sun, Huangyan Shan, Marie Kolm, Verena Blaschke, Ekaterina Artemova, Barbara Plank
Named Entity Recognition (NER) is a fundamental task to extract key information from texts, but annotated resources are scarce for dialects.
1 code implementation • 3 Feb 2024 • Ekaterina Artemova, Verena Blaschke, Barbara Plank
Inspired by prior work on English varieties, we craft and manually evaluate perturbation rules that transform German sentences into colloquial forms and use them to synthesize test sets in four ToD datasets.
1 code implementation • 9 Jan 2024 • Marat Saidov, Aleksandra Bakalova, Ekaterina Taktasheva, Vladislav Mikhailov, Ekaterina Artemova
The evaluation of Natural Language Generation (NLG) models has gained increased attention, urging the development of metrics that evaluate various aspects of generated text.
1 code implementation • 4 Sep 2023 • Leon Weber-Genzel, Robert Litschko, Ekaterina Artemova, Barbara Plank
Our results show that the choice of the right AED method and model size is indeed crucial and derive practical recommendations for how to use AED methods to clean instruction-tuning data.
1 code implementation • 9 May 2023 • Robert Litschko, Ekaterina Artemova, Barbara Plank
Transferring information retrieval (IR) models from a high-resource language (typically English) to other languages in a zero-shot fashion has become a widely adopted approach.
1 code implementation • 19 Apr 2023 • Ekaterina Artemova, Barbara Plank
Bilingual word lexicons are crucial tools for multilingual natural language understanding and machine translation tasks, as they facilitate the mapping of words in one language to their synonyms in another language.
Bilingual Lexicon Induction Natural Language Understanding +4
2 code implementations • 4 Apr 2023 • Irina Proskurina, Irina Piontkovskaya, Ekaterina Artemova
Our results contribute to understanding the behavior of monolingual LMs in the acceptability classification task, provide insights into the functional roles of attention heads, and highlight the advantages of TDA-based approaches for analyzing LMs.
Ranked #1 on Linguistic Acceptability on RuCoLA
1 code implementation • 23 Oct 2022 • Ekaterina Taktasheva, Tatiana Shavrina, Alena Fenogenova, Denis Shevelev, Nadezhda Katricheva, Maria Tikhonova, Albina Akhmetgareeva, Oleg Zinkevich, Anastasiia Bashmakova, Svetlana Iordanskaia, Alena Spiridonova, Valentina Kurenshchikova, Ekaterina Artemova, Vladislav Mikhailov
Recent advances in zero-shot and few-shot learning have shown promise for a scope of research and practical purposes.
Ranked #1 on Ethics on Ethics (per ethics)
1 code implementation • 23 Oct 2022 • Vladislav Mikhailov, Tatiana Shamardina, Max Ryabinin, Alena Pestova, Ivan Smurov, Ekaterina Artemova
Linguistic acceptability (LA) attracts the attention of the research community due to its many uses, such as testing the grammatical knowledge of language models and filtering implausible texts with acceptability classifiers.
Ranked #2 on Linguistic Acceptability on ItaCoLA
1 code implementation • 11 Oct 2022 • Mark Rofin, Vladislav Mikhailov, Mikhail Florinskiy, Andrey Kravchenko, Elena Tutubalina, Tatiana Shavrina, Daniel Karabekyan, Ekaterina Artemova
The development of state-of-the-art systems in different applied areas of machine learning (ML) is driven by benchmarks, which have shaped the paradigm of evaluating generalisation capabilities from multiple perspectives.
no code implementations • 22 Jun 2022 • Dmitry Lamanov, Pavel Burnyshev, Ekaterina Artemova, Valentin Malykh, Andrey Bout, Irina Piontkovskaya
We outperform previous state-of-the-art f1-measure by up to 16\% for unseen intents, using intent labels and user utterances and without accessing external sources (such as knowledge bases).
1 code implementation • 3 Jun 2022 • Tatiana Shamardina, Vladislav Mikhailov, Daniil Chernianskii, Alena Fenogenova, Marat Saidov, Anastasiya Valeeva, Tatiana Shavrina, Ivan Smurov, Elena Tutubalina, Ekaterina Artemova
The first task is framed as a binary classification problem.
1 code implementation • 23 May 2022 • Ekaterina Artemova, Maxim Zmeev, Natalia Loukachevitch, Igor Rozhkov, Tatiana Batura, Vladimir Ivanov, Elena Tutubalina
In the test set the frequency of all entity types is even.
1 code implementation • 19 May 2022 • Daniil Cherniavskii, Eduard Tulchinskii, Vladislav Mikhailov, Irina Proskurina, Laida Kushnareva, Ekaterina Artemova, Serguei Barannikov, Irina Piontkovskaya, Dmitri Piontkovski, Evgeny Burnaev
The role of the attention mechanism in encoding linguistic knowledge has received special interest in NLP.
Ranked #1 on Linguistic Acceptability on ItaCoLA
no code implementations • 15 Feb 2022 • Alena Fenogenova, Maria Tikhonova, Vladislav Mikhailov, Tatiana Shavrina, Anton Emelyanov, Denis Shevelev, Alexandr Kukushkin, Valentin Malykh, Ekaterina Artemova
In the last year, new neural architectures and multilingual pre-trained models have been released for Russian, which led to performance evaluation problems across a range of language understanding tasks.
no code implementations • 24 Jan 2022 • Timofey Atnashev, Veronika Ganeeva, Roman Kazakov, Daria Matyash, Michael Sonkin, Ekaterina Voloshina, Oleg Serikov, Ekaterina Artemova
The labelling is carried out on the crowdsourcing platfrom Yandex. Toloka in two stages.
2 code implementations • 29 Sep 2021 • Alexey Birshert, Ekaterina Artemova
This is in line with the common understanding of how multilingual models conduct transferring between languages
1 code implementation • EMNLP (MRL) 2021 • Ekaterina Taktasheva, Vladislav Mikhailov, Ekaterina Artemova
Recent research has adopted a new experimental field centered around the concept of text perturbations which has revealed that shuffled word order has little to no impact on the downstream performance of Transformer-based language models across many NLP tasks.
2 code implementations • EMNLP 2021 • Laida Kushnareva, Daniil Cherniavskii, Vladislav Mikhailov, Ekaterina Artemova, Serguei Barannikov, Alexander Bernstein, Irina Piontkovskaya, Dmitri Piontkovski, Evgeny Burnaev
The impressive capabilities of recent generative models to create texts that are challenging to distinguish from the human-written ones can be misused for generating fake news, product reviews, and even abusive content.
1 code implementation • RANLP 2021 • Natalia Loukachevitch, Ekaterina Artemova, Tatiana Batura, Pavel Braslavski, Ilia Denisov, Vladimir Ivanov, Suresh Manandhar, Alexander Pugachev, Elena Tutubalina
In this paper, we present NEREL, a Russian dataset for named entity recognition and relation extraction.
no code implementations • 16 Aug 2021 • Pavel Burnyshev, Valentin Malykh, Andrey Bout, Ekaterina Artemova, Irina Piontkovskaya
In the zero-shot approach, the model is trained to generate utterances from seen intents and is further used to generate utterances for intents unseen during training.
no code implementations • 23 Jul 2021 • Ivan Fursov, Alexey Zaytsev, Pavel Burnyshev, Ekaterina Dmitrieva, Nikita Klyuchnikov, Andrey Kravchenko, Ekaterina Artemova, Evgeny Burnaev
Moreover, due to the usage of the fine-tuned language model, the generated adversarial examples are hard to detect, thus current models are not robust.
3 code implementations • 29 Apr 2021 • Valentin Malykh, Alexander Kukushkin, Ekaterina Artemova, Vladislav Mikhailov, Maria Tikhonova, Tatiana Shavrina
The new generation of pre-trained NLP models push the SOTA to the new limits, but at the cost of computational resources, to the point that their use in real production environments is often prohibitively expensive.
no code implementations • NAACL (TeachingNLP) 2021 • Ekaterina Artemova, Murat Apishev, Veronika Sarkisyan, Sergey Aksenov, Denis Kirjanov, Oleg Serikov
This paper presents a new Massive Open Online Course on Natural Language Processing, targeted at non-English speaking students.
1 code implementation • NAACL (SIGTYP) 2021 • Vladislav Mikhailov, Oleg Serikov, Ekaterina Artemova
The outstanding performance of transformer-based language models on a great variety of NLP and NLU tasks has stimulated interest in exploring their inner workings.
2 code implementations • EACL (BSNLP) 2021 • Vladislav Mikhailov, Ekaterina Taktasheva, Elina Sigdel, Ekaterina Artemova
The success of pre-trained transformer language models has brought a great deal of interest on how these models work, and what they learn about language.
no code implementations • EACL 2021 • Artem Shelmanov, Dmitri Puzyrev, Lyubov Kupriyanova, Denis Belyakov, Daniil Larionov, Nikita Khromov, Olga Kozlova, Ekaterina Artemova, Dmitry V. Dylov, Alexander Panchenko
Annotating training data for sequence tagging of texts is usually very time-consuming.
1 code implementation • 11 Jan 2021 • Alexander Podolskiy, Dmitry Lipin, Andrey Bout, Ekaterina Artemova, Irina Piontkovskaya
In turn, the Mahalanobis distance captures this disparity easily.
no code implementations • COLING 2020 • Valentin Malykh, Konstantin Chernis, Ekaterina Artemova, Irina Piontkovskaya
The existing dialogue summarization corpora are significantly extractive.
no code implementations • 29 Oct 2020 • Vitaly Ivanin, Ekaterina Artemova, Tatiana Batura, Vladimir Ivanov, Veronika Sarkisyan, Elena Tutubalina, Ivan Smurov
We show-case an application of information extraction methods, such as named entity recognition (NER) and relation extraction (RE) to a novel corpus, consisting of documents, issued by a state agency.
2 code implementations • EMNLP 2020 • Tatiana Shavrina, Alena Fenogenova, Anton Emelyanov, Denis Shevelev, Ekaterina Artemova, Valentin Malykh, Vladislav Mikhailov, Maria Tikhonova, Andrey Chertok, Andrey Evlampiev
In this paper, we introduce an advanced Russian general language understanding evaluation benchmark -- RussianGLUE.
Ranked #1 on Word Sense Disambiguation on RUSSE
no code implementations • 7 Oct 2020 • Julia Rodina, Yuliya Trofimova, Andrey Kutuzov, Ekaterina Artemova
We study the effectiveness of contextualized embeddings for the task of diachronic semantic change detection for Russian language data.
no code implementations • 6 Oct 2020 • Taisia Glushkova, Alexey Machnev, Alena Fenogenova, Tatiana Shavrina, Ekaterina Artemova, Dmitry I. Ignatov
The task is to take both the question and a paragraph as input and come up with a yes/no answer, i. e. to produce a binary output.
1 code implementation • 1 Jul 2020 • Ekaterina Artemova, Tatiana Batura, Anna Golenkovskaya, Vitaly Ivanin, Vladimir Ivanov, Veronika Sarkisyan, Ivan Smurov, Elena Tutubalina
In this paper we present a corpus of Russian strategic planning documents, RuREBus.
1 code implementation • 12 Jun 2020 • Nikita Klyuchnikov, Ilya Trofimov, Ekaterina Artemova, Mikhail Salnikov, Maxim Fedorov, Evgeny Burnaev
In this work, we step outside the computer vision domain by leveraging the language modeling task, which is the core of natural language processing (NLP).
1 code implementation • 23 Mar 2020 • Ekaterina Artemova, Amir Bakarov, Aleksey Artemov, Evgeny Burnaev, Maxim Sharaev
In this paper, our focus is the connection and influence of language technologies on the research in neurolinguistics.
no code implementations • LREC 2020 • Irina Krotova, Sergey Aksenov, Ekaterina Artemova
Applications such as machine translation, speech recognition, and information retrieval require efficient handling of noun compounds as they are one of the possible sources for out-of-vocabulary (OOV) words.
no code implementations • LREC 2020 • Varvara Logacheva, Denis Teslenko, Artem Shelmanov, Steffen Remus, Dmitry Ustalov, Andrey Kutuzov, Ekaterina Artemova, Chris Biemann, Simone Paolo Ponzetto, Alexander Panchenko
We use this method to induce a collection of sense inventories for 158 languages on the basis of the original pre-trained fastText word embeddings by Grave et al. (2018), enabling WSD in these languages.
no code implementations • 8 Nov 2019 • Taisiya Glushkova, Ekaterina Artemova
We explore the abilities of character recurrent neural network (char-RNN) for hashtag segmentation.
1 code implementation • 29 Oct 2019 • Dmitry Popov, Alexander Pugachev, Polina Svyatokum, Elizaveta Svitanko, Ekaterina Artemova
We investigate the performance of sentence embeddings models on several tasks for the Russian language.
1 code implementation • WS 2019 • Dmitry Puzyrev, Artem Shelmanov, Alex Panchenko, er, Ekaterina Artemova
This paper presents the first gold-standard resource for Russian annotated with compositionality information of noun compounds.
2 code implementations • WS 2019 • Anton A. Emelyanov, Ekaterina Artemova
In this paper we tackle multilingual named entity recognition task.