1 code implementation • 26 Jul 2024 • Shuhua Yang, Hui Yuan, Xiaoying Zhang, Mengdi Wang, Hong Zhang, Huazheng Wang
Conversational recommendation systems elicit user preferences by interacting with users to obtain their feedback on recommended commodities.
no code implementations • 19 Jul 2024 • Tao Lin, Kun Jin, Andrew Estornell, Xiaoying Zhang, YiLing Chen, Yang Liu
Recommender systems serve the dual purpose of presenting relevant content to users and helping content creators reach their target audience.
1 code implementation • 16 Jun 2024 • Rui Zheng, Hongyi Guo, Zhihan Liu, Xiaoying Zhang, Yuanshun Yao, Xiaojun Xu, Zhaoran Wang, Zhiheng Xi, Tao Gui, Qi Zhang, Xuanjing Huang, Hang Li, Yang Liu
We theoretically demonstrate that this iterative reinforcement learning optimization converges to a Nash Equilibrium for the game induced by the agents.
no code implementations • 10 Jun 2024 • Xiaoying Zhang, Baolin Peng, Ye Tian, Jingyan Zhou, YiPeng Zhang, Haitao Mi, Helen Meng
Motivated by the remarkable success of the Feynman Technique in efficient human learning, we introduce Self-Tuning, a learning framework aimed at improving an LLM's ability to effectively acquire new knowledge from raw documents through self-teaching.
no code implementations • 1 Apr 2024 • Yu Zhou, Haoran Yin, Nanhao Zhou, Yanqun Tang, Xiaoying Zhang, Weijie Yuan
The recently developed affine frequency division multiplexing (AFDM) can achieve full diversity in doubly selective channels, providing a comprehensive sparse representation of the delay-Doppler domain channel.
no code implementations • 12 Mar 2024 • Wei Shen, Xiaoying Zhang, Yuanshun Yao, Rui Zheng, Hongyi Guo, Yang Liu
Reinforcement learning from human feedback (RLHF) is the mainstream paradigm used to align large language models (LLMs) with human preferences.
no code implementations • 8 Mar 2024 • Xiaoying Zhang, Jean-Francois Ton, Wei Shen, Hongning Wang, Yang Liu
We introduce Adversarial Policy Optimization (AdvPO), a novel solution to the pervasive issue of reward over-optimization in Reinforcement Learning from Human Feedback (RLHF) for Large Language Models (LLMs).
no code implementations • 14 Feb 2024 • Xiaoying Zhang, Baolin Peng, Ye Tian, Jingyan Zhou, Lifeng Jin, Linfeng Song, Haitao Mi, Helen Meng
Despite showing increasingly human-like abilities, large language models (LLMs) often struggle with factual inaccuracies, i. e. "hallucinations", even when they hold relevant knowledge.
no code implementations • 6 Jan 2024 • Hongyi Guo, Yuanshun Yao, Wei Shen, Jiaheng Wei, Xiaoying Zhang, Zhaoran Wang, Yang Liu
The key idea is to first retrieve high-quality samples related to the target domain and use them as In-context Learning examples to generate more samples.
no code implementations • 29 Aug 2023 • Jingyan Zhou, Minda Hu, Junan Li, Xiaoying Zhang, Xixin Wu, Irwin King, Helen Meng
Making moral judgments is an essential step toward developing ethical AI systems.
1 code implementation • 10 Aug 2023 • Yang Liu, Yuanshun Yao, Jean-Francois Ton, Xiaoying Zhang, Ruocheng Guo, Hao Cheng, Yegor Klochkov, Muhammad Faaiz Taufiq, Hang Li
However, a major challenge faced by practitioners is the lack of clear guidance on evaluating whether LLM outputs align with social norms, values, and regulations.
no code implementations • 15 May 2023 • Xiaoying Zhang, Baolin Peng, Kun Li, Jingyan Zhou, Helen Meng
Building end-to-end task bots and maintaining their integration with new functionalities using minimal human efforts is a long-standing challenge in dialog research.
1 code implementation • 10 Feb 2023 • Qing Zhang, Xiaoying Zhang, Yang Liu, Hongning Wang, Min Gao, Jiheng Zhang, Ruocheng Guo
Confounding bias arises due to the presence of unmeasured variables (e. g., the socio-economic status of a user) that can affect both a user's exposure and feedback.
1 code implementation • 13 Jan 2023 • Xiaoying Zhang, Hongning Wang, Hang Li
This calls for a fine-grained understanding of a user's preferences over items, where one needs to recognize the user's choice is driven by the quality of the item itself, or the pre-selected attributes of the item.
no code implementations • 20 Jan 2022 • Haidong Xie, Jia Tan, Xiaoying Zhang, Nan Ji, Haihua Liao, Zuguo Yu, Xueshuang Xiang, Naijin Liu
This leads to the problem of a malicious third party using a deep learning model to easily recognize the modulation format of the transmitted waveform.
no code implementations • SIGDIAL (ACL) 2022 • Xiaoying Zhang, Baolin Peng, Jianfeng Gao, Helen Meng
In this paper, we study the problem of automatically adapting task bots to changing environments by learning from human-bot interactions with minimum or zero human annotations.
no code implementations • 14 Jul 2021 • Kai Mei, Jun Liu, Xiaoying Zhang, Kuo Cao, Nandana Rajatheva, Jibo Wei
Besides, a training data construction approach utilizing least square (LS) estimation results is proposed so that the training data can be collected during the data transmission.
1 code implementation • 15 Jan 2021 • Mudit Chaudhary, Borislav Dzodzo, Sida Huang, Chun Hei Lo, Mingzhi Lyu, Lun Yiu Nie, Jinbo Xing, Tianhua Zhang, Xiaoying Zhang, Jingyan Zhou, Hong Cheng, Wai Lam, Helen Meng
Dialog systems enriched with external knowledge can handle user queries that are outside the scope of the supporting databases/APIs.
no code implementations • 4 Jun 2019 • Xiaoying Zhang, Hong Xie, Hang Li, John C. S. Lui
Here, a key-term can relate to a subset of arms, for example, a category of articles in news recommendation.