Search Results for author: Kaiyue Wen

Found 7 papers, 4 papers with code

RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval

1 code implementation • 28 Feb 2024 • Kaiyue Wen, Xingyu Dang, Kaifeng Lyu

This paper investigates the gap in representation powers of Recurrent Neural Networks (RNNs) and Transformers in the context of solving algorithmic problems.

Retrieval

Paper
Code

Transformers are uninterpretable with myopic methods: a case study with bounded Dyck grammars

no code implementations • NeurIPS 2023 • Kaiyue Wen, Yuchen Li, Bingbin Liu, Andrej Risteski

Interpretability methods aim to understand the algorithm implemented by a trained model (e. g., a Transofmer) by examining various aspects of the model, such as the weight matrices or the attention patterns.

LEMMA

Paper
Add Code

Practically Solving LPN in High Noise Regimes Faster Using Neural Networks

1 code implementation • 14 Mar 2023 • Haozhe Jiang, Kaiyue Wen, Yilei Chen

For some settings we are also able to provide theories that explain the rationale of the design of our models.

Vocal Bursts Intensity Prediction

Paper
Code

Finding Skill Neurons in Pre-trained Transformer-based Language Models

1 code implementation • 14 Nov 2022 • Xiaozhi Wang, Kaiyue Wen, Zhengyan Zhang, Lei Hou, Zhiyuan Liu, Juanzi Li

Furthermore, we demonstrate the skill neurons are most likely generated in pre-training rather than fine-tuning by showing that the skill neurons found with prompt tuning are also crucial for other fine-tuning methods freezing neuron weights, such as the adapter-based tuning and BitFit.

Network Pruning

Paper
Code

How Does Sharpness-Aware Minimization Minimize Sharpness?

no code implementations • 10 Nov 2022 • Kaiyue Wen, Tengyu Ma, Zhiyuan Li

SAM intends to penalize a notion of sharpness of the model but implements a computationally efficient variant; moreover, a third notion of sharpness was used for proving generalization guarantees.

Paper
Add Code

Benign Overfitting in Classification: Provably Counter Label Noise with Larger Models

no code implementations • 1 Jun 2022 • Kaiyue Wen, Jiaye Teng, Jingzhao Zhang

Studies on benign overfitting provide insights for the success of overparameterized deep learning models.

Paper
Add Code

On Transferability of Prompt Tuning for Natural Language Processing

1 code implementation • NAACL 2022 • Yusheng Su, Xiaozhi Wang, Yujia Qin, Chi-Min Chan, Yankai Lin, Huadong Wang, Kaiyue Wen, Zhiyuan Liu, Peng Li, Juanzi Li, Lei Hou, Maosong Sun, Jie zhou

To explore whether we can improve PT via prompt transfer, we empirically investigate the transferability of soft prompts across different downstream tasks and PLMs in this work.

Natural Language Understanding Transfer Learning

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.