1 code implementation • 2 Dec 2024 • Lifan Yuan, Wendi Li, Huayu Chen, Ganqu Cui, Ning Ding, Kaiyan Zhang, BoWen Zhou, Zhiyuan Liu, Hao Peng
The only assumption is to parameterize the outcome reward as the log-likelihood ratios of the policy and reference models, which can be optimized regardless of the specific choice of loss objectives.
1 code implementation • 28 Oct 2024 • Jiongxiao Wang, Fangzhou Wu, Wendi Li, Jinsheng Pan, Edward Suh, Z. Morley Mao, Muhao Chen, Chaowei Xiao
Unlike existing approaches that prevent LLMs from answering additional instructions in external text, our method implements an authentication system, requiring LLMs to answer all received instructions with a security policy and selectively filter out responses to user instructions as the final output.
1 code implementation • 15 Oct 2024 • Wendi Li, Yixuan Li
PQM optimizes Q-value rankings based on a novel comparative loss function, enhancing the model's ability to capture the intricate dynamics among sequential decisions.
no code implementations • 4 Jun 2024 • Shixuan Fan, Wei Wei, Wendi Li, Xian-Ling Mao, Wenfeng Xie, Dangyang Chen
The core of the dialogue system is to generate relevant, informative, and human-like responses based on extensive dialogue history.
1 code implementation • 18 Mar 2024 • Wendi Li, Wei Wei, Kaihe Xu, Wenfeng Xie, Dangyang Chen, Yu Cheng
To meet the requirements of real-world applications, it is essential to control generations of large language models (LLMs).
1 code implementation • 20 Jul 2023 • Wendi Li, Wei Wei, Xiaoye Qu, Xian-Ling Mao, Ye Yuan, Wenfeng Xie, Dangyang Chen
TREA constructs a multi-hierarchical scalable tree as the reasoning structure to clarify the causal relationships between mentioned entities, and fully utilizes historical conversations to generate more reasonable and suitable responses for recommended results.
1 code implementation • 4 May 2023 • Sen Zhao, Wei Wei, Yifan Liu, Ziyang Wang, Wendi Li, Xian-Ling Mao, Shuai Zhu, Minghui Yang, Zujie Wen
Conversational recommendation systems (CRS) aim to timely and proactively acquire user dynamic preferred attributes through conversations for item recommendation.
1 code implementation • 11 Jan 2022 • Wendi Li, Xiao Yang, Weiqing Liu, Yingce Xia, Jiang Bian
To handle concept drift, previous methods first detect when/where the concept drift happens and then adapt models to fit the distribution of the latest data.