Search Results for author: Ivan Kobyzev

Found 19 papers, 4 papers with code

Resonance RoPE: Improving Context Length Generalization of Large Language Models

1 code implementation • 29 Feb 2024 • Suyuchen Wang, Ivan Kobyzev, Peng Lu, Mehdi Rezagholizadeh, Bang Liu

This paper addresses the challenge of train-short-test-long (TSTL) scenarios in Large Language Models (LLMs) equipped with Rotary Position Embedding (RoPE), where models pre-trained on shorter sequences face difficulty with out-of-distribution (OOD) token positions in longer sequences.

Language Modelling Position

Paper
Code

Hyperparameter Optimization for Large Language Model Instruction-Tuning

no code implementations • 1 Dec 2023 • Christophe Tribes, Sacha Benarroch-Lelong, Peng Lu, Ivan Kobyzev

The performance on downstream tasks of models fine-tuned with LoRA heavily relies on a set of hyperparameters including the rank of the decomposition.

Hyperparameter Optimization Language Modelling +1

Paper
Add Code

Attribute Controlled Dialogue Prompting

no code implementations • 11 Jul 2023 • Runcheng Liu, Ahmad Rashid, Ivan Kobyzev, Mehdi Rezagholizadeh, Pascal Poupart

Prompt-tuning has become an increasingly popular parameter-efficient method for adapting large pretrained language models to downstream tasks.

Attribute Dialogue Generation

Paper
Add Code

LABO: Towards Learning Optimal Label Regularization via Bi-level Optimization

no code implementations • 8 May 2023 • Peng Lu, Ahmad Rashid, Ivan Kobyzev, Mehdi Rezagholizadeh, Philippe Langlais

Label Smoothing (LS) is another simple, versatile and efficient regularization which can be applied to various supervised classification tasks.

Image Classification Machine Translation

Paper
Add Code

Mathematical Challenges in Deep Learning

no code implementations • 24 Mar 2023 • Vahid Partovi Nia, Guojun Zhang, Ivan Kobyzev, Michael R. Metel, Xinlin Li, Ke Sun, Sobhan Hemati, Masoud Asgharian, Linglong Kong, Wulong Liu, Boxing Chen

Deep models are dominating the artificial intelligence (AI) industry since the ImageNet challenge in 2012.

Paper
Add Code

KronA: Parameter Efficient Tuning with Kronecker Adapter

no code implementations • 20 Dec 2022 • Ali Edalati, Marzieh Tahaei, Ivan Kobyzev, Vahid Partovi Nia, James J. Clark, Mehdi Rezagholizadeh

We apply the proposed methods for fine-tuning T5 on the GLUE benchmark to show that incorporating the Kronecker-based modules can outperform state-of-the-art PET methods.

Language Modelling

Paper
Add Code

Continuation KD: Improved Knowledge Distillation through the Lens of Continuation Optimization

no code implementations • 12 Dec 2022 • Aref Jafari, Ivan Kobyzev, Mehdi Rezagholizadeh, Pascal Poupart, Ali Ghodsi

Knowledge Distillation (KD) has been extensively used for natural language understanding (NLU) tasks to improve a small model's (a student) generalization by transferring the knowledge from a larger model (a teacher).

Knowledge Distillation Natural Language Understanding

Paper
Add Code

Improving Generalization of Pre-trained Language Models via Stochastic Weight Averaging

no code implementations • 12 Dec 2022 • Peng Lu, Ivan Kobyzev, Mehdi Rezagholizadeh, Ahmad Rashid, Ali Ghodsi, Philippe Langlais

Moreover, we observe that this simple optimization technique is able to outperform the state-of-the-art KD methods for compact models.

Knowledge Distillation Question Answering +2

Paper
Add Code

DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation

2 code implementations • 14 Oct 2022 • Mojtaba Valipour, Mehdi Rezagholizadeh, Ivan Kobyzev, Ali Ghodsi

Our DyLoRA method trains LoRA blocks for a range of ranks instead of a single rank by sorting the representation learned by the adapter module at different ranks during training.

Natural Language Understanding Text Generation

1,966

Paper
Code

Learning Functions on Multiple Sets using Multi-Set Transformers

1 code implementation • 30 Jun 2022 • Kira Selby, Ahmad Rashid, Ivan Kobyzev, Mehdi Rezagholizadeh, Pascal Poupart

We propose a general deep architecture for learning functions on multiple permutation-invariant sets.

Paper
Code

Do we need Label Regularization to Fine-tune Pre-trained Language Models?

no code implementations • 25 May 2022 • Ivan Kobyzev, Aref Jafari, Mehdi Rezagholizadeh, Tianda Li, Alan Do-Omri, Peng Lu, Pascal Poupart, Ali Ghodsi

Knowledge Distillation (KD) is a prominent neural model compression technique that heavily relies on teacher network predictions to guide the training of a student model.

Knowledge Distillation Model Compression

Paper
Add Code

A Short Study on Compressing Decoder-Based Language Models

no code implementations • 16 Oct 2021 • Tianda Li, Yassir El Mesbahi, Ivan Kobyzev, Ahmad Rashid, Atif Mahmud, Nithin Anchuri, Habib Hajimolahoseini, Yang Liu, Mehdi Rezagholizadeh

Pre-trained Language Models (PLMs) have been successful for a wide range of natural language processing (NLP) tasks.

Knowledge Distillation Model Compression

Paper
Add Code

Equivariant Finite Normalizing Flows

no code implementations • 16 Oct 2021 • Avishek Joey Bose, Marcus Brubaker, Ivan Kobyzev

Generative modeling seeks to uncover the underlying factors that give rise to observed data that can often be modeled as the natural symmetries that manifest themselves through invariances and equivariances to certain transformation laws.

Paper
Add Code

Pseudo Knowledge Distillation: Towards Learning Optimal Instance-specific Label Smoothing Regularization

no code implementations • 29 Sep 2021 • Peng Lu, Ahmad Rashid, Ivan Kobyzev, Mehdi Rezagholizadeh, Philippe Langlais

Knowledge Distillation (KD) is an algorithm that transfers the knowledge of a trained, typically larger, neural network into another model under training.

Image Classification Knowledge Distillation +1

Paper
Add Code

Polarized-VAE: Proximity Based Disentangled Representation Learning for Text Generation

no code implementations • EACL 2021 • Vikash Balasubramanian, Ivan Kobyzev, Hareesh Bahuleyan, Ilya Shapiro, Olga Vechtomova

Learning disentangled representations of real-world data is a challenging open problem.

Attribute Disentanglement +1

Paper
Add Code

Generating Emotionally Aligned Responses in Dialogues using Affect Control Theory

no code implementations • 7 Mar 2020 • Nabiha Asghar, Ivan Kobyzev, Jesse Hoey, Pascal Poupart, Muhammad Bilal Sheikh

State-of-the-art neural dialogue systems excel at syntactic and semantic modelling of language, but often have a hard time establishing emotional alignment with the human interactant during a conversation.

Dialogue Generation

Paper
Add Code

Normalizing Flows: An Introduction and Review of Current Methods

2 code implementations • 25 Aug 2019 • Ivan Kobyzev, Simon J. D. Prince, Marcus A. Brubaker

Normalizing Flows are generative models which produce tractable distributions where both sampling and density evaluation can be efficient and exact.

Paper
Code

Tails of Lipschitz Triangular Flows

no code implementations • ICML 2020 • Priyank Jaini, Ivan Kobyzev, Yao-Liang Yu, Marcus Brubaker

We investigate the ability of popular flow based methods to capture tail-properties of a target density by studying the increasing triangular maps used in these flow methods acting on a tractable source density.

Paper
Add Code

Representation Learning for Dynamic Graphs: A Survey

no code implementations • 27 May 2019 • Seyed Mehran Kazemi, Rishab Goel, Kshitij Jain, Ivan Kobyzev, Akshay Sethi, Peter Forsyth, Pascal Poupart

Graphs arise naturally in many real-world applications including social networks, recommender systems, ontologies, biology, and computational finance.

Knowledge Graphs Recommendation Systems +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.