Search Results for author: Wendi Li

Found 8 papers, 7 papers with code

Free Process Rewards without Process Labels

1 code implementation2 Dec 2024 Lifan Yuan, Wendi Li, Huayu Chen, Ganqu Cui, Ning Ding, Kaiyan Zhang, BoWen Zhou, Zhiyuan Liu, Hao Peng

The only assumption is to parameterize the outcome reward as the log-likelihood ratios of the policy and reference models, which can be optimized regardless of the specific choice of loss objectives.

Math

FATH: Authentication-based Test-time Defense against Indirect Prompt Injection Attacks

1 code implementation28 Oct 2024 Jiongxiao Wang, Fangzhou Wu, Wendi Li, Jinsheng Pan, Edward Suh, Z. Morley Mao, Muhao Chen, Chaowei Xiao

Unlike existing approaches that prevent LLMs from answering additional instructions in external text, our method implements an authentication system, requiring LLMs to answer all received instructions with a security policy and selectively filter out responses to user instructions as the final output.

Process Reward Model with Q-Value Rankings

1 code implementation15 Oct 2024 Wendi Li, Yixuan Li

PQM optimizes Q-value rankings based on a novel comparative loss function, enhancing the model's ability to capture the intricate dynamics among sequential decisions.

Decision Making Language Modelling

Position Debiasing Fine-Tuning for Causal Perception in Long-Term Dialogue

no code implementations4 Jun 2024 Shixuan Fan, Wei Wei, Wendi Li, Xian-Ling Mao, Wenfeng Xie, Dangyang Chen

The core of the dialogue system is to generate relevant, informative, and human-like responses based on extensive dialogue history.

Dialogue Generation Position +2

Reinforcement Learning with Token-level Feedback for Controllable Text Generation

1 code implementation18 Mar 2024 Wendi Li, Wei Wei, Kaihe Xu, Wenfeng Xie, Dangyang Chen, Yu Cheng

To meet the requirements of real-world applications, it is essential to control generations of large language models (LLMs).

Attribute reinforcement-learning +4

TREA: Tree-Structure Reasoning Schema for Conversational Recommendation

1 code implementation20 Jul 2023 Wendi Li, Wei Wei, Xiaoye Qu, Xian-Ling Mao, Ye Yuan, Wenfeng Xie, Dangyang Chen

TREA constructs a multi-hierarchical scalable tree as the reasoning structure to clarify the causal relationships between mentioned entities, and fully utilizes historical conversations to generate more reasonable and suitable responses for recommended results.

Conversational Recommendation Knowledge Graphs +1

Towards Hierarchical Policy Learning for Conversational Recommendation with Hypergraph-based Reinforcement Learning

1 code implementation4 May 2023 Sen Zhao, Wei Wei, Yifan Liu, Ziyang Wang, Wendi Li, Xian-Ling Mao, Shuai Zhu, Minghui Yang, Zujie Wen

Conversational recommendation systems (CRS) aim to timely and proactively acquire user dynamic preferred attributes through conversations for item recommendation.

Attribute Conversational Recommendation +3

DDG-DA: Data Distribution Generation for Predictable Concept Drift Adaptation

1 code implementation11 Jan 2022 Wendi Li, Xiao Yang, Weiqing Liu, Yingce Xia, Jiang Bian

To handle concept drift, previous methods first detect when/where the concept drift happens and then adapt models to fit the distribution of the latest data.

Stock Prediction

Cannot find the paper you are looking for? You can Submit a new open access paper.