1 code implementation • 29 Feb 2024 • Suyuchen Wang, Ivan Kobyzev, Peng Lu, Mehdi Rezagholizadeh, Bang Liu
This paper addresses the challenge of train-short-test-long (TSTL) scenarios in Large Language Models (LLMs) equipped with Rotary Position Embedding (RoPE), where models pre-trained on shorter sequences face difficulty with out-of-distribution (OOD) token positions in longer sequences.
no code implementations • 1 Dec 2023 • Christophe Tribes, Sacha Benarroch-Lelong, Peng Lu, Ivan Kobyzev
The performance on downstream tasks of models fine-tuned with LoRA heavily relies on a set of hyperparameters including the rank of the decomposition.
no code implementations • 11 Jul 2023 • Runcheng Liu, Ahmad Rashid, Ivan Kobyzev, Mehdi Rezagholizadeh, Pascal Poupart
Prompt-tuning has become an increasingly popular parameter-efficient method for adapting large pretrained language models to downstream tasks.
no code implementations • 8 May 2023 • Peng Lu, Ahmad Rashid, Ivan Kobyzev, Mehdi Rezagholizadeh, Philippe Langlais
Label Smoothing (LS) is another simple, versatile and efficient regularization which can be applied to various supervised classification tasks.
no code implementations • 24 Mar 2023 • Vahid Partovi Nia, Guojun Zhang, Ivan Kobyzev, Michael R. Metel, Xinlin Li, Ke Sun, Sobhan Hemati, Masoud Asgharian, Linglong Kong, Wulong Liu, Boxing Chen
Deep models are dominating the artificial intelligence (AI) industry since the ImageNet challenge in 2012.
no code implementations • 20 Dec 2022 • Ali Edalati, Marzieh Tahaei, Ivan Kobyzev, Vahid Partovi Nia, James J. Clark, Mehdi Rezagholizadeh
We apply the proposed methods for fine-tuning T5 on the GLUE benchmark to show that incorporating the Kronecker-based modules can outperform state-of-the-art PET methods.
no code implementations • 12 Dec 2022 • Aref Jafari, Ivan Kobyzev, Mehdi Rezagholizadeh, Pascal Poupart, Ali Ghodsi
Knowledge Distillation (KD) has been extensively used for natural language understanding (NLU) tasks to improve a small model's (a student) generalization by transferring the knowledge from a larger model (a teacher).
no code implementations • 12 Dec 2022 • Peng Lu, Ivan Kobyzev, Mehdi Rezagholizadeh, Ahmad Rashid, Ali Ghodsi, Philippe Langlais
Moreover, we observe that this simple optimization technique is able to outperform the state-of-the-art KD methods for compact models.
2 code implementations • 14 Oct 2022 • Mojtaba Valipour, Mehdi Rezagholizadeh, Ivan Kobyzev, Ali Ghodsi
Our DyLoRA method trains LoRA blocks for a range of ranks instead of a single rank by sorting the representation learned by the adapter module at different ranks during training.
1 code implementation • 30 Jun 2022 • Kira Selby, Ahmad Rashid, Ivan Kobyzev, Mehdi Rezagholizadeh, Pascal Poupart
We propose a general deep architecture for learning functions on multiple permutation-invariant sets.
no code implementations • 25 May 2022 • Ivan Kobyzev, Aref Jafari, Mehdi Rezagholizadeh, Tianda Li, Alan Do-Omri, Peng Lu, Pascal Poupart, Ali Ghodsi
Knowledge Distillation (KD) is a prominent neural model compression technique that heavily relies on teacher network predictions to guide the training of a student model.
no code implementations • 16 Oct 2021 • Tianda Li, Yassir El Mesbahi, Ivan Kobyzev, Ahmad Rashid, Atif Mahmud, Nithin Anchuri, Habib Hajimolahoseini, Yang Liu, Mehdi Rezagholizadeh
Pre-trained Language Models (PLMs) have been successful for a wide range of natural language processing (NLP) tasks.
no code implementations • 16 Oct 2021 • Avishek Joey Bose, Marcus Brubaker, Ivan Kobyzev
Generative modeling seeks to uncover the underlying factors that give rise to observed data that can often be modeled as the natural symmetries that manifest themselves through invariances and equivariances to certain transformation laws.
no code implementations • 29 Sep 2021 • Peng Lu, Ahmad Rashid, Ivan Kobyzev, Mehdi Rezagholizadeh, Philippe Langlais
Knowledge Distillation (KD) is an algorithm that transfers the knowledge of a trained, typically larger, neural network into another model under training.
no code implementations • EACL 2021 • Vikash Balasubramanian, Ivan Kobyzev, Hareesh Bahuleyan, Ilya Shapiro, Olga Vechtomova
Learning disentangled representations of real-world data is a challenging open problem.
no code implementations • 7 Mar 2020 • Nabiha Asghar, Ivan Kobyzev, Jesse Hoey, Pascal Poupart, Muhammad Bilal Sheikh
State-of-the-art neural dialogue systems excel at syntactic and semantic modelling of language, but often have a hard time establishing emotional alignment with the human interactant during a conversation.
2 code implementations • 25 Aug 2019 • Ivan Kobyzev, Simon J. D. Prince, Marcus A. Brubaker
Normalizing Flows are generative models which produce tractable distributions where both sampling and density evaluation can be efficient and exact.
no code implementations • ICML 2020 • Priyank Jaini, Ivan Kobyzev, Yao-Liang Yu, Marcus Brubaker
We investigate the ability of popular flow based methods to capture tail-properties of a target density by studying the increasing triangular maps used in these flow methods acting on a tractable source density.
no code implementations • 27 May 2019 • Seyed Mehran Kazemi, Rishab Goel, Kshitij Jain, Ivan Kobyzev, Akshay Sethi, Peter Forsyth, Pascal Poupart
Graphs arise naturally in many real-world applications including social networks, recommender systems, ontologies, biology, and computational finance.