no code implementations • INLG (ACL) 2021 • Pavel Burnyshev, Valentin Malykh, Andrey Bout, Ekaterina Artemova, Irina Piontkovskaya
We explore two approaches to the generation of task-oriented utterances: in the zero-shot approach, the model is trained to generate utterances from seen intents and is further used to generate utterances for intents unseen during training.
no code implementations • NAACL 2022 • Nikita Sorokin, Dmitry Abulkhanov, Irina Piontkovskaya, Valentin Malykh
Cross-lingual question answering is a thriving field in the modern world, helping people to search information on the web more efficiently.
no code implementations • RANLP 2021 • Pavel Burnyshev, Andrey Bout, Valentin Malykh, Irina Piontkovskaya
Natural language understanding is an important task in modern dialogue systems.
no code implementations • RANLP 2021 • Artur Ilichev, Nikita Sorokin, Irina Piontkovskaya, Valentin Malykh
The language models nowadays are in the center of natural language processing progress.
no code implementations • 26 Jan 2025 • Alexey Rukhovich, Alexander Podolskiy, Irina Piontkovskaya
In multi-domain learning, a single model is trained on diverse data domains to leverage shared knowledge and improve generalization.
1 code implementation • 10 Oct 2024 • Kristian Kuznetsov, Eduard Tulchinskii, Laida Kushnareva, German Magai, Serguei Barannikov, Sergey Nikolenko, Irina Piontkovskaya
Growing amount and quality of AI-generated texts makes detecting such content more difficult.
no code implementations • 3 Oct 2024 • Eduard Tulchinskii, Laida Kushnareva, Kristian Kuznetsov, Anastasia Voznyuk, Andrei Andriiainen, Irina Piontkovskaya, Evgeny Burnaev, Serguei Barannikov
A standard way to evaluate the abilities of LLM involves presenting a multiple-choice question and selecting the option with the highest logit as the model's predicted answer.
no code implementations • 21 Jun 2024 • Tatiana Gaintseva, Laida Kushnareva, German Magai, Irina Piontkovskaya, Sergey Nikolenko, Martin Benning, Serguei Barannikov, Gregory Slabaugh
In this work, we focus on the robustness of AI-generated image (AIGI) detectors.
no code implementations • 20 Nov 2023 • Andrey Bout, Alexander Podolskiy, Sergey Nikolenko, Irina Piontkovskaya
Progress in neural grammatical error correction (GEC) is hindered by the lack of annotated training data.
1 code implementation • 14 Nov 2023 • Konstantin Yakovlev, Alexander Podolskiy, Andrey Bout, Sergey Nikolenko, Irina Piontkovskaya
Grammatical error correction (GEC) is an important NLP task that is currently usually solved with autoregressive sequence-to-sequence models.
1 code implementation • 14 Nov 2023 • Laida Kushnareva, Tatiana Gaintseva, German Magai, Serguei Barannikov, Dmitry Abulkhanov, Kristian Kuznetsov, Eduard Tulchinskii, Irina Piontkovskaya, Sergey Nikolenko
Due to the rapid development of large language models, people increasingly often encounter texts that may start as written by a human but continue as machine-generated.
Ranked #2 on
Boundary Detection
on RoFT-chatgpt
no code implementations • 14 Nov 2023 • Konstantin Yakovlev, Gregory Polyakov, Ilseyar Alimova, Alexander Podolskiy, Andrey Bout, Sergey Nikolenko, Irina Piontkovskaya
A recent trend in multimodal retrieval is related to postprocessing test set results via the dual-softmax loss (DSL).
1 code implementation • NeurIPS 2023 • Eduard Tulchinskii, Kristian Kuznetsov, Laida Kushnareva, Daniil Cherniavskii, Serguei Barannikov, Irina Piontkovskaya, Sergey Nikolenko, Evgeny Burnaev
Rapidly increasing quality of AI-generated content makes it difficult to distinguish between human and AI-generated texts, which may lead to undesirable consequences for society.
no code implementations • 19 May 2023 • Anton Tikhonov, Nikita Sorokin, Dmitry Abulkhanov, Irina Piontkovskaya, Sergey Nikolenko, Valentin Malykh
We consider the well-known and important tasks of clone detection and information retrieval for source code.
2 code implementations • 4 Apr 2023 • Irina Proskurina, Irina Piontkovskaya, Ekaterina Artemova
Our results contribute to understanding the behavior of monolingual LMs in the acceptability classification task, provide insights into the functional roles of attention heads, and highlight the advantages of TDA-based approaches for analyzing LMs.
Ranked #1 on
Linguistic Acceptability
on RuCoLA
no code implementations • 20 Mar 2023 • Xiaozhe Ren, Pingyi Zhou, Xinfan Meng, Xinjing Huang, Yadao Wang, Weichao Wang, Pengfei Li, Xiaoda Zhang, Alexander Podolskiy, Grigory Arshinov, Andrey Bout, Irina Piontkovskaya, Jiansheng Wei, Xin Jiang, Teng Su, Qun Liu, Jun Yao
In this work, we develop a system that trained a trillion-parameter language model on a cluster of Ascend 910 AI processors and MindSpore framework, and present the language model with 1. 085T parameters named PanGu-{\Sigma}.
no code implementations • 30 Nov 2022 • Eduard Tulchinskii, Kristian Kuznetsov, Laida Kushnareva, Daniil Cherniavskii, Serguei Barannikov, Irina Piontkovskaya, Sergey Nikolenko, Evgeny Burnaev
We apply topological data analysis (TDA) to speech classification problems and to the introspection of a pretrained speech model, HuBERT.
1 code implementation • 5 Jul 2022 • Laida Kushnareva, Dmitri Piontkovski, Irina Piontkovskaya
We apply methods of topological analysis to the attention graphs, calculated on the attention heads of the BERT model ( arXiv:1810. 04805v2 ).
no code implementations • 22 Jun 2022 • Dmitry Lamanov, Pavel Burnyshev, Ekaterina Artemova, Valentin Malykh, Andrey Bout, Irina Piontkovskaya
We outperform previous state-of-the-art f1-measure by up to 16\% for unseen intents, using intent labels and user utterances and without accessing external sources (such as knowledge bases).
1 code implementation • 19 May 2022 • Daniil Cherniavskii, Eduard Tulchinskii, Vladislav Mikhailov, Irina Proskurina, Laida Kushnareva, Ekaterina Artemova, Serguei Barannikov, Irina Piontkovskaya, Dmitri Piontkovski, Evgeny Burnaev
The role of the attention mechanism in encoding linguistic knowledge has received special interest in NLP.
Ranked #1 on
Linguistic Acceptability
on ItaCoLA
2 code implementations • EMNLP 2021 • Laida Kushnareva, Daniil Cherniavskii, Vladislav Mikhailov, Ekaterina Artemova, Serguei Barannikov, Alexander Bernstein, Irina Piontkovskaya, Dmitri Piontkovski, Evgeny Burnaev
The impressive capabilities of recent generative models to create texts that are challenging to distinguish from the human-written ones can be misused for generating fake news, product reviews, and even abusive content.
no code implementations • 16 Aug 2021 • Pavel Burnyshev, Valentin Malykh, Andrey Bout, Ekaterina Artemova, Irina Piontkovskaya
In the zero-shot approach, the model is trained to generate utterances from seen intents and is further used to generate utterances for intents unseen during training.
1 code implementation • 11 Jan 2021 • Alexander Podolskiy, Dmitry Lipin, Andrey Bout, Ekaterina Artemova, Irina Piontkovskaya
In turn, the Mahalanobis distance captures this disparity easily.
no code implementations • COLING 2020 • Valentin Malykh, Konstantin Chernis, Ekaterina Artemova, Irina Piontkovskaya
The existing dialogue summarization corpora are significantly extractive.
no code implementations • ICLR 2018 • Vadim Popov, Mikhail Kudinov, Irina Piontkovskaya, Petr Vytovtov, Alex Nevidomsky
In language modeling, users’ language (e. g. in private messaging) could change in a year and be completely different from what we observe in publicly available data.
no code implementations • 20 Dec 2017 • Vadim Popov, Mikhail Kudinov, Irina Piontkovskaya, Petr Vytovtov, Alex Nevidomsky
One of the big challenges in machine learning applications is that training data can be different from the real-world data faced by the algorithm.