1 code implementation • 28 Feb 2024 • Kaiyue Wen, Xingyu Dang, Kaifeng Lyu
This paper investigates the gap in representation powers of Recurrent Neural Networks (RNNs) and Transformers in the context of solving algorithmic problems.
no code implementations • NeurIPS 2023 • Kaiyue Wen, Yuchen Li, Bingbin Liu, Andrej Risteski
Interpretability methods aim to understand the algorithm implemented by a trained model (e. g., a Transofmer) by examining various aspects of the model, such as the weight matrices or the attention patterns.
1 code implementation • 14 Mar 2023 • Haozhe Jiang, Kaiyue Wen, Yilei Chen
For some settings we are also able to provide theories that explain the rationale of the design of our models.
1 code implementation • 14 Nov 2022 • Xiaozhi Wang, Kaiyue Wen, Zhengyan Zhang, Lei Hou, Zhiyuan Liu, Juanzi Li
Furthermore, we demonstrate the skill neurons are most likely generated in pre-training rather than fine-tuning by showing that the skill neurons found with prompt tuning are also crucial for other fine-tuning methods freezing neuron weights, such as the adapter-based tuning and BitFit.
no code implementations • 10 Nov 2022 • Kaiyue Wen, Tengyu Ma, Zhiyuan Li
SAM intends to penalize a notion of sharpness of the model but implements a computationally efficient variant; moreover, a third notion of sharpness was used for proving generalization guarantees.
no code implementations • 1 Jun 2022 • Kaiyue Wen, Jiaye Teng, Jingzhao Zhang
Studies on benign overfitting provide insights for the success of overparameterized deep learning models.
1 code implementation • NAACL 2022 • Yusheng Su, Xiaozhi Wang, Yujia Qin, Chi-Min Chan, Yankai Lin, Huadong Wang, Kaiyue Wen, Zhiyuan Liu, Peng Li, Juanzi Li, Lei Hou, Maosong Sun, Jie zhou
To explore whether we can improve PT via prompt transfer, we empirically investigate the transferability of soft prompts across different downstream tasks and PLMs in this work.