Search Results for author: Ivan Kobyzev

Found 19 papers, 4 papers with code

Resonance RoPE: Improving Context Length Generalization of Large Language Models

1 code implementation29 Feb 2024 Suyuchen Wang, Ivan Kobyzev, Peng Lu, Mehdi Rezagholizadeh, Bang Liu

This paper addresses the challenge of train-short-test-long (TSTL) scenarios in Large Language Models (LLMs) equipped with Rotary Position Embedding (RoPE), where models pre-trained on shorter sequences face difficulty with out-of-distribution (OOD) token positions in longer sequences.

Language Modelling Position

Hyperparameter Optimization for Large Language Model Instruction-Tuning

no code implementations1 Dec 2023 Christophe Tribes, Sacha Benarroch-Lelong, Peng Lu, Ivan Kobyzev

The performance on downstream tasks of models fine-tuned with LoRA heavily relies on a set of hyperparameters including the rank of the decomposition.

Hyperparameter Optimization Language Modelling +1

Attribute Controlled Dialogue Prompting

no code implementations11 Jul 2023 Runcheng Liu, Ahmad Rashid, Ivan Kobyzev, Mehdi Rezagholizadeh, Pascal Poupart

Prompt-tuning has become an increasingly popular parameter-efficient method for adapting large pretrained language models to downstream tasks.

Attribute Dialogue Generation

LABO: Towards Learning Optimal Label Regularization via Bi-level Optimization

no code implementations8 May 2023 Peng Lu, Ahmad Rashid, Ivan Kobyzev, Mehdi Rezagholizadeh, Philippe Langlais

Label Smoothing (LS) is another simple, versatile and efficient regularization which can be applied to various supervised classification tasks.

Image Classification Machine Translation

Mathematical Challenges in Deep Learning

no code implementations24 Mar 2023 Vahid Partovi Nia, Guojun Zhang, Ivan Kobyzev, Michael R. Metel, Xinlin Li, Ke Sun, Sobhan Hemati, Masoud Asgharian, Linglong Kong, Wulong Liu, Boxing Chen

Deep models are dominating the artificial intelligence (AI) industry since the ImageNet challenge in 2012.

KronA: Parameter Efficient Tuning with Kronecker Adapter

no code implementations20 Dec 2022 Ali Edalati, Marzieh Tahaei, Ivan Kobyzev, Vahid Partovi Nia, James J. Clark, Mehdi Rezagholizadeh

We apply the proposed methods for fine-tuning T5 on the GLUE benchmark to show that incorporating the Kronecker-based modules can outperform state-of-the-art PET methods.

Language Modelling

Continuation KD: Improved Knowledge Distillation through the Lens of Continuation Optimization

no code implementations12 Dec 2022 Aref Jafari, Ivan Kobyzev, Mehdi Rezagholizadeh, Pascal Poupart, Ali Ghodsi

Knowledge Distillation (KD) has been extensively used for natural language understanding (NLU) tasks to improve a small model's (a student) generalization by transferring the knowledge from a larger model (a teacher).

Knowledge Distillation Natural Language Understanding

DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation

2 code implementations14 Oct 2022 Mojtaba Valipour, Mehdi Rezagholizadeh, Ivan Kobyzev, Ali Ghodsi

Our DyLoRA method trains LoRA blocks for a range of ranks instead of a single rank by sorting the representation learned by the adapter module at different ranks during training.

Natural Language Understanding Text Generation

Learning Functions on Multiple Sets using Multi-Set Transformers

1 code implementation30 Jun 2022 Kira Selby, Ahmad Rashid, Ivan Kobyzev, Mehdi Rezagholizadeh, Pascal Poupart

We propose a general deep architecture for learning functions on multiple permutation-invariant sets.

Do we need Label Regularization to Fine-tune Pre-trained Language Models?

no code implementations25 May 2022 Ivan Kobyzev, Aref Jafari, Mehdi Rezagholizadeh, Tianda Li, Alan Do-Omri, Peng Lu, Pascal Poupart, Ali Ghodsi

Knowledge Distillation (KD) is a prominent neural model compression technique that heavily relies on teacher network predictions to guide the training of a student model.

Knowledge Distillation Model Compression

Equivariant Finite Normalizing Flows

no code implementations16 Oct 2021 Avishek Joey Bose, Marcus Brubaker, Ivan Kobyzev

Generative modeling seeks to uncover the underlying factors that give rise to observed data that can often be modeled as the natural symmetries that manifest themselves through invariances and equivariances to certain transformation laws.

Pseudo Knowledge Distillation: Towards Learning Optimal Instance-specific Label Smoothing Regularization

no code implementations29 Sep 2021 Peng Lu, Ahmad Rashid, Ivan Kobyzev, Mehdi Rezagholizadeh, Philippe Langlais

Knowledge Distillation (KD) is an algorithm that transfers the knowledge of a trained, typically larger, neural network into another model under training.

Image Classification Knowledge Distillation +1

Generating Emotionally Aligned Responses in Dialogues using Affect Control Theory

no code implementations7 Mar 2020 Nabiha Asghar, Ivan Kobyzev, Jesse Hoey, Pascal Poupart, Muhammad Bilal Sheikh

State-of-the-art neural dialogue systems excel at syntactic and semantic modelling of language, but often have a hard time establishing emotional alignment with the human interactant during a conversation.

Dialogue Generation

Normalizing Flows: An Introduction and Review of Current Methods

2 code implementations25 Aug 2019 Ivan Kobyzev, Simon J. D. Prince, Marcus A. Brubaker

Normalizing Flows are generative models which produce tractable distributions where both sampling and density evaluation can be efficient and exact.

Tails of Lipschitz Triangular Flows

no code implementations ICML 2020 Priyank Jaini, Ivan Kobyzev, Yao-Liang Yu, Marcus Brubaker

We investigate the ability of popular flow based methods to capture tail-properties of a target density by studying the increasing triangular maps used in these flow methods acting on a tractable source density.

Representation Learning for Dynamic Graphs: A Survey

no code implementations27 May 2019 Seyed Mehran Kazemi, Rishab Goel, Kshitij Jain, Ivan Kobyzev, Akshay Sethi, Peter Forsyth, Pascal Poupart

Graphs arise naturally in many real-world applications including social networks, recommender systems, ontologies, biology, and computational finance.

Knowledge Graphs Recommendation Systems +1

Cannot find the paper you are looking for? You can Submit a new open access paper.