Search Results for author: Nikita Balagansky

Found 8 papers, 2 papers with code

PALBERT”:" Teaching ALBERT to Ponder

no code implementations RepL4NLP (ACL) 2022 Daniil Gavrilov, Nikita Balagansky

Currently, pre-trained models can be considered the default choice for a wide range of NLP tasks.

Linear Transformers with Learnable Kernel Functions are Better In-Context Models

2 code implementations16 Feb 2024 Yaroslav Aksenov, Nikita Balagansky, Sofia Maria Lo Cicero Vaina, Boris Shaposhnikov, Alexey Gorbatovski, Daniil Gavrilov

Advancing the frontier of subquadratic architectures for Language Models (LMs) is crucial in the rapidly evolving field of natural language processing.

In-Context Learning Language Modelling

Ahead-of-Time P-Tuning

no code implementations18 May 2023 Daniil Gavrilov, Nikita Balagansky

In this paper, we propose Ahead-of-Time (AoT) P-Tuning, a novel parameter-efficient fine-tuning method for pre-trained Language Models (LMs) that adds input-dependent bias before each Transformer layer.

Benchmarking

Diffusion Language Models Generation Can Be Halted Early

no code implementations18 May 2023 Sofia Maria Lo Cicero Vaina, Nikita Balagansky, Daniil Gavrilov

We evaluate our methods on Plaid, SSD, and CDCD DLMs and create a cohesive perspective on their generation workflows.

Language Modelling Text Generation

Linear Interpolation In Parameter Space is Good Enough for Fine-Tuned Language Models

no code implementations22 Nov 2022 Mark Rofin, Nikita Balagansky, Daniil Gavrilov

The simplest way to obtain continuous interpolation between two points in high dimensional space is to draw a line between them.

Attribute Text Generation

Classifiers are Better Experts for Controllable Text Generation

no code implementations15 May 2022 Askhat Sitdikov, Nikita Balagansky, Daniil Gavrilov, Alexander Markov

This paper proposes a simple method for controllable text generation based on weighting logits with a free-form classifier, namely CAIF sampling.

Text Generation

PALBERT: Teaching ALBERT to Ponder

1 code implementation7 Apr 2022 Nikita Balagansky, Daniil Gavrilov

Recently proposed PonderNet may be a promising solution for performing an early exit by treating the exit layer's index as a latent variable.

Weight Squeezing: Reparameterization for Knowledge Transfer and Model Compression

no code implementations14 Oct 2020 Artem Chumachenko, Daniil Gavrilov, Nikita Balagansky, Pavel Kalaidin

We also proposed a variant of Weight Squeezing called Gated Weight Squeezing, for which we combined fine-tuning of BERT-Medium model and learning mapping from BERT-Base weights.

General Classification Model Compression +3

Cannot find the paper you are looking for? You can Submit a new open access paper.