Search Results for author: Yutao Sun

Found 11 papers, 7 papers with code

FocusLLM: Scaling LLM's Context by Parallel Decoding

no code implementations21 Aug 2024 Zhenyu Li, Yike Zhang, Tengyu Pan, Yutao Sun, Zhichao Duan, Junjie Fang, Rong Han, Zixuan Wang, Jianyong Wang

Empowering LLMs with the ability to utilize useful information from a long context is crucial for many downstream applications.

8k Decoder +1

Preserving Knowledge in Large Language Model with Model-Agnostic Self-Decompression

no code implementations17 Jun 2024 Zilun Zhang, Yutao Sun, Tiancheng Zhao, Leigang Sha, Ruochen Xu, Kyusong Lee, Jianwei Yin

Humans can retain old knowledge while learning new information, but Large Language Models (LLMs) often suffer from catastrophic forgetting when post-pretrained or supervised fine-tuned (SFT) on domain-specific data.

Language Modelling Large Language Model

You Only Cache Once: Decoder-Decoder Architectures for Language Models

1 code implementation8 May 2024 Yutao Sun, Li Dong, Yi Zhu, Shaohan Huang, Wenhui Wang, Shuming Ma, Quanlu Zhang, Jianyong Wang, Furu Wei

We introduce a decoder-decoder architecture, YOCO, for large language models, which only caches key-value pairs once.

Decoder Retrieval

Retentive Network: A Successor to Transformer for Large Language Models

9 code implementations17 Jul 2023 Yutao Sun, Li Dong, Shaohan Huang, Shuming Ma, Yuqing Xia, Jilong Xue, Jianyong Wang, Furu Wei

In this work, we propose Retentive Network (RetNet) as a foundation architecture for large language models, simultaneously achieving training parallelism, low-cost inference, and good performance.

Language Modelling

Debiased Inference for Dynamic Nonlinear Panels with Multi-dimensional Heterogeneities

no code implementations4 May 2023 Xuan Leng, Jiaming Mao, Yutao Sun

We introduce a generic class of dynamic nonlinear heterogeneous parameter models that incorporate individual and time effects in both the intercept and slope.

Vocal Bursts Valence Prediction

Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers

1 code implementation20 Dec 2022 Damai Dai, Yutao Sun, Li Dong, Yaru Hao, Shuming Ma, Zhifang Sui, Furu Wei

We comprehensively compare the behaviors of in-context learning and explicit finetuning on real tasks to provide empirical evidence that supports our understanding.

In-Context Learning Open-Ended Question Answering

Structured Prompting: Scaling In-Context Learning to 1,000 Examples

1 code implementation13 Dec 2022 Yaru Hao, Yutao Sun, Li Dong, Zhixiong Han, Yuxian Gu, Furu Wei

Large language models have exhibited intriguing in-context learning capability, achieving promising zero- and few-shot performance without updating the parameters.

In-Context Learning

Prototypical Calibration for Few-shot Learning of Language Models

1 code implementation20 May 2022 Zhixiong Han, Yaru Hao, Li Dong, Yutao Sun, Furu Wei

In-context learning of GPT-like models has been recognized as fragile across different hand-crafted templates, and demonstration permutations.

Few-Shot Learning In-Context Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.