no code implementations • 23 Dec 2024 • Chenlong Deng, Zhisong Zhang, Kelong Mao, Shuaiyi Li, Xinting Huang, Dong Yu, Zhicheng Dou
In this work, we provide a thorough investigation of gist-based context compression methods to improve long-context processing in large language models.
no code implementations • 21 Dec 2024 • Zhisong Zhang, Yan Wang, Xinting Huang, Tianqing Fang, Hongming Zhang, Chenlong Deng, Shuaiyi Li, Dong Yu
In this work, we provide a detailed analysis of this issue and identify that unusually high attention entropy can be a key factor.
no code implementations • 17 Dec 2024 • Moxin Li, Yong Zhao, Yang Deng, Wenxuan Zhang, Shuaiyi Li, Wenya Xie, See-Kiong Ng, Tat-Seng Chua
Although large language models (LLMs) store vast amount of knowledge in their parameters, they still have limitations in the memorization and utilization of certain knowledge, leading to undesired behaviors such as generating untruthful and inaccurate responses.
no code implementations • 24 Jun 2024 • Deng Cai, Huayang Li, Tingchen Fu, Siheng Li, Weiwen Xu, Shuaiyi Li, Bowen Cao, Zhisong Zhang, Xinting Huang, Leyang Cui, Yan Wang, Lemao Liu, Taro Watanabe, Shuming Shi
Despite the general capabilities of pre-trained large language models (LLMs), they still need further adaptation to better serve practical applications.
1 code implementation • 8 Mar 2024 • Shuaiyi Li, Yang Deng, Deng Cai, Hongyuan Lu, Liang Chen, Wai Lam
As the typical retraining paradigm is unacceptably time- and resource-consuming, researchers are turning to model editing to find an effective way that supports both consecutive and batch scenarios to edit the model behavior directly.
no code implementations • 16 Nov 2023 • Liang Chen, Yatao Bian, Yang Deng, Deng Cai, Shuaiyi Li, Peilin Zhao, Kam-Fai Wong
Text watermarking has emerged as a pivotal technique for identifying machine-generated text.
1 code implementation • 19 Oct 2023 • Shuaiyi Li, Yang Deng, Wai Lam
Specifically, we design a novel node memory scheme and aggregate the information over the depth dimension instead of the breadth dimension of the graph, which empowers the ability to collect long dependencies without stacking multiple layers.