no code implementations • 14 Dec 2023 • Anton Shapkin, Denis Litvinov, Yaroslav Zharov, Egor Bogomolov, Timur Galimzyanov, Timofey Bryksin
Our approach achieves several targets: (1) lifting the length limitations of the context window, saving on the prompt size; (2) allowing huge expansion of the number of retrieval entities available for the context; (3) alleviating the problem of misspelling or failing to find relevant entity names.
1 code implementation • 15 Aug 2023 • Aleksandra Eliseeva, Yaroslav Sokolov, Egor Bogomolov, Yaroslav Golubev, Danny Dig, Timofey Bryksin
We use this dataset to evaluate the completion setting and the usefulness of the historical context for state-of-the-art CMG models and GPT-3. 5-turbo.
no code implementations • 6 Mar 2023 • Dmitry Pasechnyuk, Anton Prazdnichnykh, Mikhail Evtikhiev, Timofey Bryksin
In this work, we test the performance of various optimizers on deep learning models for source code and find that the choice of an optimizer can have a significant impact on the model quality, with up to two-fold score differences between some of the relatively well-performing optimizers.
1 code implementation • 5 Aug 2022 • Mikhail Evtikhiev, Egor Bogomolov, Yaroslav Sokolov, Timofey Bryksin
Despite all that, minimal differences in the metric scores have been used in recent papers to claim superiority of some code generation models over the others.
no code implementations • 17 Jun 2022 • Maksim Zubkov, Egor Spirin, Egor Bogomolov, Timofey Bryksin
The first task is code clone detection, which we evaluate on the POJ-104 dataset containing implementations of 104 algorithms.
no code implementations • 17 Jun 2022 • Ilya Utkin, Egor Spirin, Egor Bogomolov, Timofey Bryksin
Even though the process of extracting ASTs from code can be done with different parsers, the impact of choosing a parser on the final model quality remains unstudied.
2 code implementations • 7 Jun 2022 • Egor Bogomolov, Sergey Zhuravlev, Egor Spirin, Timofey Bryksin
We evaluate three models of different complexity and compare their quality in three settings: trained on a large dataset of Java projects, further fine-tuned on the data from a particular project, and trained from scratch on this data.
no code implementations • 21 May 2022 • Vitaliy Bibaev, Alexey Kalina, Vadim Lomshakov, Yaroslav Golubev, Alexander Bezzubov, Nikita Povarov, Timofey Bryksin
We used the logs to collect a dataset of code completions from users, and employed it to train a ranking CatBoost model.
no code implementations • 5 Apr 2022 • Fuxiang Chen, Fatemeh Fard, David Lo, Timofey Bryksin
Furthermore, some programming languages are inherently different and code written in one language usually cannot be interchanged with the others, i. e., Ruby and Java code possess very different structure.
1 code implementation • 14 Jan 2022 • Denis Sushentsev, Aleksandr Khvorov, Roman Vasiliev, Yaroslav Golubev, Timofey Bryksin
In this work, we explore the applicability of existing solutions for the bug triage problem when stack traces are used as the main data source of bug reports.
no code implementations • 3 Jun 2021 • Mikhail Pravilov, Egor Bogomolov, Yaroslav Golubev, Timofey Bryksin
As for the commit message generation, our model demonstrated the same results as supervised models trained for this specific task, which indicates that it can encode code changes well and can be improved in the future by pre-training on a larger dataset of easily gathered code changes.
1 code implementation • 23 Mar 2021 • Egor Spirin, Egor Bogomolov, Vladimir Kovalenko, Timofey Bryksin
PSI trees contain code syntax trees as well as functions to work with them, and therefore can be used to enrich code representation using static analysis algorithms of modern IDEs.
1 code implementation • 9 Dec 2020 • Elena Lyulina, Anastasiia Birillo, Vladimir Kovalenko, Timofey Bryksin
To validate and showcase the toolkit, we present a dataset collected by our tools.
Software Engineering D.2.2; K.3.2
2 code implementations • 6 Jul 2020 • Egor Bogomolov, Yaroslav Golubev, Artyom Lobanov, Vladimir Kovalenko, Timofey Bryksin
We use a dataset of 9 million GitHub projects as a reference search base.
Software Engineering
1 code implementation • 3 Apr 2020 • Timofey Bryksin, Victor Petukhov, Ilya Alexin, Stanislav Prikhodko, Alexey Shpilman, Vladimir Kovalenko, Nikita Povarov
In this work, we apply anomaly detection to source code and bytecode to facilitate the development of a programming language and its compiler.
2 code implementations • 10 Feb 2020 • Vladimir Kovalenko, Egor Bogomolov, Timofey Bryksin, Alberto Bacchelli
With the goal of facilitating team collaboration, we propose a new approach to building vector representations of individual developers by capturing their individual contribution style, or coding style.
Software Engineering Social and Information Networks