Search Results for author: Kaiyue Wen

Found 7 papers, 4 papers with code

RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval

1 code implementation28 Feb 2024 Kaiyue Wen, Xingyu Dang, Kaifeng Lyu

This paper investigates the gap in representation powers of Recurrent Neural Networks (RNNs) and Transformers in the context of solving algorithmic problems.

Retrieval

Transformers are uninterpretable with myopic methods: a case study with bounded Dyck grammars

no code implementations NeurIPS 2023 Kaiyue Wen, Yuchen Li, Bingbin Liu, Andrej Risteski

Interpretability methods aim to understand the algorithm implemented by a trained model (e. g., a Transofmer) by examining various aspects of the model, such as the weight matrices or the attention patterns.

LEMMA

Practically Solving LPN in High Noise Regimes Faster Using Neural Networks

1 code implementation14 Mar 2023 Haozhe Jiang, Kaiyue Wen, Yilei Chen

For some settings we are also able to provide theories that explain the rationale of the design of our models.

Vocal Bursts Intensity Prediction

Finding Skill Neurons in Pre-trained Transformer-based Language Models

1 code implementation14 Nov 2022 Xiaozhi Wang, Kaiyue Wen, Zhengyan Zhang, Lei Hou, Zhiyuan Liu, Juanzi Li

Furthermore, we demonstrate the skill neurons are most likely generated in pre-training rather than fine-tuning by showing that the skill neurons found with prompt tuning are also crucial for other fine-tuning methods freezing neuron weights, such as the adapter-based tuning and BitFit.

Network Pruning

How Does Sharpness-Aware Minimization Minimize Sharpness?

no code implementations10 Nov 2022 Kaiyue Wen, Tengyu Ma, Zhiyuan Li

SAM intends to penalize a notion of sharpness of the model but implements a computationally efficient variant; moreover, a third notion of sharpness was used for proving generalization guarantees.

Benign Overfitting in Classification: Provably Counter Label Noise with Larger Models

no code implementations1 Jun 2022 Kaiyue Wen, Jiaye Teng, Jingzhao Zhang

Studies on benign overfitting provide insights for the success of overparameterized deep learning models.

On Transferability of Prompt Tuning for Natural Language Processing

1 code implementation NAACL 2022 Yusheng Su, Xiaozhi Wang, Yujia Qin, Chi-Min Chan, Yankai Lin, Huadong Wang, Kaiyue Wen, Zhiyuan Liu, Peng Li, Juanzi Li, Lei Hou, Maosong Sun, Jie zhou

To explore whether we can improve PT via prompt transfer, we empirically investigate the transferability of soft prompts across different downstream tasks and PLMs in this work.

Natural Language Understanding Transfer Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.