1 code implementation • 7 Oct 2024 • Tianzhu Ye, Li Dong, Yuqing Xia, Yutao Sun, Yi Zhu, Gao Huang, Furu Wei
Transformer tends to overallocate attention to irrelevant context.
no code implementations • 21 Aug 2024 • Zhenyu Li, Yike Zhang, Tengyu Pan, Yutao Sun, Zhichao Duan, Junjie Fang, Rong Han, Zixuan Wang, Jianyong Wang
Empowering LLMs with the ability to utilize useful information from a long context is crucial for many downstream applications.
no code implementations • 17 Jun 2024 • Zilun Zhang, Yutao Sun, Tiancheng Zhao, Leigang Sha, Ruochen Xu, Kyusong Lee, Jianwei Yin
Humans can retain old knowledge while learning new information, but Large Language Models (LLMs) often suffer from catastrophic forgetting when post-pretrained or supervised fine-tuned (SFT) on domain-specific data.
no code implementations • 6 Jun 2024 • Yutao Sun, Mingshuai Chen, Tiancheng Zhao, Kangjia Zhao, He Li, Jintao Chen, Liqiang Lu, Xinkui Zhao, Shuiguang Deng, Jianwei Yin
Artificial intelligence is rapidly encroaching on the field of service regulation.
1 code implementation • 8 May 2024 • Yutao Sun, Li Dong, Yi Zhu, Shaohan Huang, Wenhui Wang, Shuming Ma, Quanlu Zhang, Jianyong Wang, Furu Wei
We introduce a decoder-decoder architecture, YOCO, for large language models, which only caches key-value pairs once.
9 code implementations • 17 Jul 2023 • Yutao Sun, Li Dong, Shaohan Huang, Shuming Ma, Yuqing Xia, Jilong Xue, Jianyong Wang, Furu Wei
In this work, we propose Retentive Network (RetNet) as a foundation architecture for large language models, simultaneously achieving training parallelism, low-cost inference, and good performance.
no code implementations • 4 May 2023 • Xuan Leng, Jiaming Mao, Yutao Sun
We introduce a generic class of dynamic nonlinear heterogeneous parameter models that incorporate individual and time effects in both the intercept and slope.
5 code implementations • 20 Dec 2022 • Yutao Sun, Li Dong, Barun Patra, Shuming Ma, Shaohan Huang, Alon Benhaim, Vishrav Chaudhary, Xia Song, Furu Wei
Position modeling plays a critical role in Transformers.
1 code implementation • 20 Dec 2022 • Damai Dai, Yutao Sun, Li Dong, Yaru Hao, Shuming Ma, Zhifang Sui, Furu Wei
We comprehensively compare the behaviors of in-context learning and explicit finetuning on real tasks to provide empirical evidence that supports our understanding.
1 code implementation • 13 Dec 2022 • Yaru Hao, Yutao Sun, Li Dong, Zhixiong Han, Yuxian Gu, Furu Wei
Large language models have exhibited intriguing in-context learning capability, achieving promising zero- and few-shot performance without updating the parameters.
1 code implementation • 20 May 2022 • Zhixiong Han, Yaru Hao, Li Dong, Yutao Sun, Furu Wei
In-context learning of GPT-like models has been recognized as fragile across different hand-crafted templates, and demonstration permutations.