Search Results for author: Mikhail Burtsev

Found 30 papers, 20 papers with code

Uncertainty Estimation of Transformer Predictions for Misclassification Detection

1 code implementation ACL 2022 Artem Vazhentsev, Gleb Kuzmin, Artem Shelmanov, Akim Tsvigun, Evgenii Tsymbalov, Kirill Fedyanin, Maxim Panov, Alexander Panchenko, Gleb Gusev, Mikhail Burtsev, Manvel Avetisian, Leonid Zhukov

Uncertainty estimation (UE) of model predictions is a crucial step for a variety of tasks such as active learning, misclassification detection, adversarial attack detection, out-of-distribution detection, etc.

Active Learning Adversarial Attack Detection +7

Discourse-Driven Integrated Dialogue Development Environment for Open-Domain Dialogue Systems

no code implementations CODI 2021 Denis Kuznetsov, Dmitry Evseev, Lidia Ostyakova, Oleg Serikov, Daniel Kornev, Mikhail Burtsev

Development environments for spoken dialogue systems are popular today because they enable rapid creation of the dialogue systems in times when usage of the voice AI Assistants is constantly growing.

Spoken Dialogue Systems

SRMT: Shared Memory for Multi-agent Lifelong Pathfinding

1 code implementation22 Jan 2025 Alsu Sagirova, Yuri Kuratov, Mikhail Burtsev

Multi-agent reinforcement learning (MARL) demonstrates significant progress in solving cooperative and competitive multi-agent problems in various environments.

Multi-agent Reinforcement Learning reinforcement-learning +1

Learning Elementary Cellular Automata with Transformers

1 code implementation2 Dec 2024 Mikhail Burtsev

Large Language Models demonstrate remarkable mathematical capabilities but at the same time struggle with abstract reasoning and planning.

Associative Recurrent Memory Transformer

1 code implementation5 Jul 2024 Ivan Rodkin, Yuri Kuratov, Aydar Bulatov, Mikhail Burtsev

This paper addresses the challenge of creating a neural architecture for very long sequences that requires constant time for processing new information at each time step.

Retrieval

Complexity of Symbolic Representation in Working Memory of Transformer Correlates with the Complexity of a Task

no code implementations20 Jun 2024 Alsu Sagirova, Mikhail Burtsev

Even though Transformers are extensively used for Natural Language Processing tasks, especially for machine translation, they lack an explicit memory to store key concepts of processed texts.

Decoder Diversity +2

BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack

4 code implementations14 Jun 2024 Yuri Kuratov, Aydar Bulatov, Petr Anokhin, Ivan Rodkin, Dmitry Sorokin, Artyom Sorokin, Mikhail Burtsev

The BABILong benchmark is extendable to any length to support the evaluation of new upcoming models with increased capabilities, and we provide splits up to 10 million token lengths.

Question Answering

In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs Miss

2 code implementations16 Feb 2024 Yuri Kuratov, Aydar Bulatov, Petr Anokhin, Dmitry Sorokin, Artyom Sorokin, Mikhail Burtsev

This paper addresses the challenge of processing long documents using generative transformer models.

RAG

Uncertainty Guided Global Memory Improves Multi-Hop Question Answering

1 code implementation29 Nov 2023 Alsu Sagirova, Mikhail Burtsev

Conversely, the second group relies on the attention mechanism of the long input encoding model to facilitate multi-hop reasoning.

Multi-hop Question Answering Question Answering

Better Together: Enhancing Generative Knowledge Graph Completion with Language Models and Neighborhood Information

1 code implementation2 Nov 2023 Alla Chepurova, Aydar Bulatov, Yuri Kuratov, Mikhail Burtsev

In this study, we propose to include node neighborhoods as additional information to improve KGC methods based on language models.

Imputation World Knowledge

Explain My Surprise: Learning Efficient Long-Term Memory by Predicting Uncertain Outcomes

1 code implementation27 Jul 2022 Artyom Sorokin, Nazar Buzun, Leonid Pugachev, Mikhail Burtsev

This requires to store prohibitively large intermediate data if a sequence consists of thousands or even millions elements, and as a result, makes learning of very long-term dependencies infeasible.

Knowledge Distillation of Russian Language Models with Reduction of Vocabulary

1 code implementation4 May 2022 Alina Kolesnikova, Yuri Kuratov, Vasily Konovalov, Mikhail Burtsev

We propose two simple yet effective alignment techniques to make knowledge distillation to the students with reduced vocabulary.

Knowledge Distillation

Building and Evaluating Open-Domain Dialogue Corpora with Clarifying Questions

1 code implementation EMNLP 2021 Mohammad Aliannejadi, Julia Kiseleva, Aleksandr Chuklin, Jeffrey Dalton, Mikhail Burtsev

Enabling open-domain dialogue systems to ask clarifying questions when appropriate is an important direction for improving the quality of the system response.

Multi-Stream Transformers

1 code implementation21 Jul 2021 Mikhail Burtsev, Anna Rumshisky

Transformer-based encoder-decoder models produce a fused token-wise representation after every encoder layer.

Decoder

Short Text Clustering with Transformers

no code implementations31 Jan 2021 Leonid Pugachev, Mikhail Burtsev

Recent techniques for the task of short text clustering often rely on word embeddings as a transfer learning component.

Clustering Sentence +3

Memory Representation in Transformer

no code implementations1 Jan 2021 Mikhail Burtsev, Yurii Kuratov, Anton Peganov, Grigory V. Sapunov

Adding trainable memory to selectively store local as well as global representations of a sequence is a promising direction to improve the Transformer model.

Language Modeling Language Modelling +2

ConvAI3: Generating Clarifying Questions for Open-Domain Dialogue Systems (ClariQ)

3 code implementations23 Sep 2020 Mohammad Aliannejadi, Julia Kiseleva, Aleksandr Chuklin, Jeff Dalton, Mikhail Burtsev

The main aim of the conversational systems is to return an appropriate answer in response to the user requests.

Goal-Oriented Multi-Task BERT-Based Dialogue State Tracker

no code implementations5 Feb 2020 Pavel Gulyaev, Eugenia Elistratova, Vasily Konovalov, Yuri Kuratov, Leonid Pugachev, Mikhail Burtsev

The organizers introduced the Schema-Guided Dialogue (SGD) dataset with multi-domain conversations and released a zero-shot dialogue state tracking model.

Dialogue State Tracking Question Answering +1

Loss Landscape Sightseeing with Multi-Point Optimization

1 code implementation9 Oct 2019 Ivan Skorokhodov, Mikhail Burtsev

We present multi-point optimization: an optimization technique that allows to train several models simultaneously without the need to keep the parameters of each one individually.

Cannot find the paper you are looking for? You can Submit a new open access paper.