Search Results for author: Xiaoying Zhang

Found 19 papers, 6 papers with code

Conversational Dueling Bandits in Generalized Linear Models

1 code implementation26 Jul 2024 Shuhua Yang, Hui Yuan, Xiaoying Zhang, Mengdi Wang, Hong Zhang, Huazheng Wang

Conversational recommendation systems elicit user preferences by interacting with users to obtain their feedback on recommended commodities.

Conversational Recommendation Informativeness +1

User-Creator Feature Polarization in Recommender Systems with Dual Influence

no code implementations19 Jul 2024 Tao Lin, Kun Jin, Andrew Estornell, Xiaoying Zhang, YiLing Chen, Yang Liu

Recommender systems serve the dual purpose of presenting relevant content to users and helping content creators reach their target audience.

Diversity Recommendation Systems

Toward Optimal LLM Alignments Using Two-Player Games

1 code implementation16 Jun 2024 Rui Zheng, Hongyi Guo, Zhihan Liu, Xiaoying Zhang, Yuanshun Yao, Xiaojun Xu, Zhaoran Wang, Zhiheng Xi, Tao Gui, Qi Zhang, Xuanjing Huang, Hang Li, Yang Liu

We theoretically demonstrate that this iterative reinforcement learning optimization converges to a Nash Equilibrium for the game induced by the agents.

reinforcement-learning Reinforcement Learning

Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching

no code implementations10 Jun 2024 Xiaoying Zhang, Baolin Peng, Ye Tian, Jingyan Zhou, YiPeng Zhang, Haitao Mi, Helen Meng

Motivated by the remarkable success of the Feynman Technique in efficient human learning, we introduce Self-Tuning, a learning framework aimed at improving an LLM's ability to effectively acquire new knowledge from raw documents through self-teaching.

Memorization

GI-Free Pilot-Aided Channel Estimation for Affine Frequency Division Multiplexing Systems

no code implementations1 Apr 2024 Yu Zhou, Haoran Yin, Nanhao Zhou, Yanqun Tang, Xiaoying Zhang, Weijie Yuan

The recently developed affine frequency division multiplexing (AFDM) can achieve full diversity in doubly selective channels, providing a comprehensive sparse representation of the delay-Doppler domain channel.

Diversity

Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards

no code implementations12 Mar 2024 Wei Shen, Xiaoying Zhang, Yuanshun Yao, Rui Zheng, Hongyi Guo, Yang Liu

Reinforcement learning from human feedback (RLHF) is the mainstream paradigm used to align large language models (LLMs) with human preferences.

reinforcement-learning Reinforcement Learning

Overcoming Reward Overoptimization via Adversarial Policy Optimization with Lightweight Uncertainty Estimation

no code implementations8 Mar 2024 Xiaoying Zhang, Jean-Francois Ton, Wei Shen, Hongning Wang, Yang Liu

We introduce Adversarial Policy Optimization (AdvPO), a novel solution to the pervasive issue of reward over-optimization in Reinforcement Learning from Human Feedback (RLHF) for Large Language Models (LLMs).

Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation

no code implementations14 Feb 2024 Xiaoying Zhang, Baolin Peng, Ye Tian, Jingyan Zhou, Lifeng Jin, Linfeng Song, Haitao Mi, Helen Meng

Despite showing increasingly human-like abilities, large language models (LLMs) often struggle with factual inaccuracies, i. e. "hallucinations", even when they hold relevant knowledge.

TruthfulQA

Human-Instruction-Free LLM Self-Alignment with Limited Samples

no code implementations6 Jan 2024 Hongyi Guo, Yuanshun Yao, Wei Shen, Jiaheng Wei, Xiaoying Zhang, Zhaoran Wang, Yang Liu

The key idea is to first retrieve high-quality samples related to the target domain and use them as In-context Learning examples to generate more samples.

In-Context Learning Instruction Following

Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment

1 code implementation10 Aug 2023 Yang Liu, Yuanshun Yao, Jean-Francois Ton, Xiaoying Zhang, Ruocheng Guo, Hao Cheng, Yegor Klochkov, Muhammad Faaiz Taufiq, Hang Li

However, a major challenge faced by practitioners is the lack of clear guidance on evaluating whether LLM outputs align with social norms, values, and regulations.

Fairness Models Alignment

SGP-TOD: Building Task Bots Effortlessly via Schema-Guided LLM Prompting

no code implementations15 May 2023 Xiaoying Zhang, Baolin Peng, Kun Li, Jingyan Zhou, Helen Meng

Building end-to-end task bots and maintaining their integration with new functionalities using minimal human efforts is a long-standing challenge in dialog research.

dialog state tracking

Debiasing Recommendation by Learning Identifiable Latent Confounders

1 code implementation10 Feb 2023 Qing Zhang, Xiaoying Zhang, Yang Liu, Hongning Wang, Min Gao, Jiheng Zhang, Ruocheng Guo

Confounding bias arises due to the presence of unmeasured variables (e. g., the socio-economic status of a user) that can affect both a user's exposure and feedback.

Causal Inference counterfactual +1

Disentangled Representation for Diversified Recommendations

1 code implementation13 Jan 2023 Xiaoying Zhang, Hongning Wang, Hang Li

This calls for a fine-grained understanding of a user's preferences over items, where one needs to recognize the user's choice is driven by the quality of the item itself, or the pre-selected attributes of the item.

Diversity

Low-Interception Waveform: To Prevent the Recognition of Spectrum Waveform Modulation via Adversarial Examples

no code implementations20 Jan 2022 Haidong Xie, Jia Tan, Xiaoying Zhang, Nan Ji, Haihua Liao, Zuguo Yu, Xueshuang Xiang, Naijin Liu

This leads to the problem of a malicious third party using a deep learning model to easily recognize the modulation format of the transmitted waveform.

Deep Learning

Toward Self-learning End-to-End Task-Oriented Dialog Systems

no code implementations SIGDIAL (ACL) 2022 Xiaoying Zhang, Baolin Peng, Jianfeng Gao, Helen Meng

In this paper, we study the problem of automatically adapting task bots to changing environments by learning from human-bot interactions with minimum or zero human annotations.

reinforcement-learning Reinforcement Learning (RL) +1

A Low Complexity Learning-based Channel Estimation for OFDM Systems with Online Training

no code implementations14 Jul 2021 Kai Mei, Jun Liu, Xiaoying Zhang, Kuo Cao, Nandana Rajatheva, Jibo Wei

Besides, a training data construction approach utilizing least square (LS) estimation results is proposed so that the training data can be collected during the data transmission.

BIG-bench Machine Learning

Conversational Contextual Bandit: Algorithm and Application

no code implementations4 Jun 2019 Xiaoying Zhang, Hong Xie, Hang Li, John C. S. Lui

Here, a key-term can relate to a subset of arms, for example, a category of articles in news recommendation.

News Recommendation Recommendation Systems

Cannot find the paper you are looking for? You can Submit a new open access paper.