Search Results for author: Alexey Gorbatovski

Found 4 papers, 1 papers with code

Learn Your Reference Model for Real Good Alignment

no code implementations15 Apr 2024 Alexey Gorbatovski, Boris Shaposhnikov, Alexey Malakhov, Nikita Surnachev, Yaroslav Aksenov, Ian Maksimov, Nikita Balagansky, Daniil Gavrilov

For instance, in the fundamental Reinforcement Learning From Human Feedback (RLHF) technique of Language Model alignment, in addition to reward maximization, the Kullback-Leibler divergence between the trainable policy and the SFT policy is minimized.

Language Modelling

Linear Transformers with Learnable Kernel Functions are Better In-Context Models

2 code implementations16 Feb 2024 Yaroslav Aksenov, Nikita Balagansky, Sofia Maria Lo Cicero Vaina, Boris Shaposhnikov, Alexey Gorbatovski, Daniil Gavrilov

Advancing the frontier of subquadratic architectures for Language Models (LMs) is crucial in the rapidly evolving field of natural language processing.

In-Context Learning Language Modelling

Reinforcement learning for question answering in programming domain using public community scoring as a human feedback

no code implementations19 Jan 2024 Alexey Gorbatovski, Sergey Kovalchuk

In this study, we investigate the enhancement of the GPT Neo 125M performance in Community Question Answering (CQA) with a focus on programming, through the integration of Reinforcement Learning from Human Feedback (RLHF) and the utilization of scores from Stack Overflow.

Community Question Answering

Bayesian Networks for Named Entity Prediction in Programming Community Question Answering

no code implementations26 Feb 2023 Alexey Gorbatovski, Sergey Kovalchuk

We also discuss the influence of penalty terms on the structure of Bayesian networks and how they can be used to analyze the relationships between entities.

Community Question Answering

Cannot find the paper you are looking for? You can Submit a new open access paper.