Search Results for author: Xiaoying Zhang

Found 15 papers, 4 papers with code

GI-Free Pilot-Aided Channel Estimation for Affine Frequency Division Multiplexing Systems

no code implementations1 Apr 2024 Yu Zhou, Haoran Yin, Nanhao Zhou, Yanqun Tang, Xiaoying Zhang, Weijie Yuan

The recently developed affine frequency division multiplexing (AFDM) can achieve full diversity in doubly selective channels, providing a comprehensive sparse representation of the delay-Doppler domain channel.

Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards

no code implementations12 Mar 2024 Wei Shen, Xiaoying Zhang, Yuanshun Yao, Rui Zheng, Hongyi Guo, Yang Liu

Reinforcement learning from human feedback (RLHF) is the mainstream paradigm used to align large language models (LLMs) with human preferences.

reinforcement-learning

Overcoming Reward Overoptimization via Adversarial Policy Optimization with Lightweight Uncertainty Estimation

no code implementations8 Mar 2024 Xiaoying Zhang, Jean-Francois Ton, Wei Shen, Hongning Wang, Yang Liu

We introduce Adversarial Policy Optimization (AdvPO), a novel solution to the pervasive issue of reward over-optimization in Reinforcement Learning from Human Feedback (RLHF) for Large Language Models (LLMs).

Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation

no code implementations14 Feb 2024 Xiaoying Zhang, Baolin Peng, Ye Tian, Jingyan Zhou, Lifeng Jin, Linfeng Song, Haitao Mi, Helen Meng

Despite showing increasingly human-like abilities, large language models (LLMs) often struggle with factual inaccuracies, i. e. "hallucinations", even when they hold relevant knowledge.

Human-Instruction-Free LLM Self-Alignment with Limited Samples

no code implementations6 Jan 2024 Hongyi Guo, Yuanshun Yao, Wei Shen, Jiaheng Wei, Xiaoying Zhang, Zhaoran Wang, Yang Liu

The key idea is to first retrieve high-quality samples related to the target domain and use them as In-context Learning examples to generate more samples.

In-Context Learning Instruction Following

Rethinking Machine Ethics -- Can LLMs Perform Moral Reasoning through the Lens of Moral Theories?

no code implementations29 Aug 2023 Jingyan Zhou, Minda Hu, Junan Li, Xiaoying Zhang, Xixin Wu, Irwin King, Helen Meng

Our analysis exhibits the potentials and flaws in existing resources (models and datasets) in developing explainable moral judgment-making systems.

Ethics

Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment

1 code implementation10 Aug 2023 Yang Liu, Yuanshun Yao, Jean-Francois Ton, Xiaoying Zhang, Ruocheng Guo, Hao Cheng, Yegor Klochkov, Muhammad Faaiz Taufiq, Hang Li

However, a major challenge faced by practitioners is the lack of clear guidance on evaluating whether LLM outputs align with social norms, values, and regulations.

Fairness Models Alignment

SGP-TOD: Building Task Bots Effortlessly via Schema-Guided LLM Prompting

no code implementations15 May 2023 Xiaoying Zhang, Baolin Peng, Kun Li, Jingyan Zhou, Helen Meng

Building end-to-end task bots and maintaining their integration with new functionalities using minimal human efforts is a long-standing challenge in dialog research.

dialog state tracking

Debiasing Recommendation by Learning Identifiable Latent Confounders

1 code implementation10 Feb 2023 Qing Zhang, Xiaoying Zhang, Yang Liu, Hongning Wang, Min Gao, Jiheng Zhang, Ruocheng Guo

Confounding bias arises due to the presence of unmeasured variables (e. g., the socio-economic status of a user) that can affect both a user's exposure and feedback.

Causal Inference counterfactual +1

Disentangled Representation for Diversified Recommendations

1 code implementation13 Jan 2023 Xiaoying Zhang, Hongning Wang, Hang Li

This calls for a fine-grained understanding of a user's preferences over items, where one needs to recognize the user's choice is driven by the quality of the item itself, or the pre-selected attributes of the item.

Low-Interception Waveform: To Prevent the Recognition of Spectrum Waveform Modulation via Adversarial Examples

no code implementations20 Jan 2022 Haidong Xie, Jia Tan, Xiaoying Zhang, Nan Ji, Haihua Liao, Zuguo Yu, Xueshuang Xiang, Naijin Liu

This leads to the problem of a malicious third party using a deep learning model to easily recognize the modulation format of the transmitted waveform.

Toward Self-learning End-to-End Task-Oriented Dialog Systems

no code implementations SIGDIAL (ACL) 2022 Xiaoying Zhang, Baolin Peng, Jianfeng Gao, Helen Meng

In this paper, we study the problem of automatically adapting task bots to changing environments by learning from human-bot interactions with minimum or zero human annotations.

reinforcement-learning Reinforcement Learning (RL) +1

A Low Complexity Learning-based Channel Estimation for OFDM Systems with Online Training

no code implementations14 Jul 2021 Kai Mei, Jun Liu, Xiaoying Zhang, Kuo Cao, Nandana Rajatheva, Jibo Wei

Besides, a training data construction approach utilizing least square (LS) estimation results is proposed so that the training data can be collected during the data transmission.

BIG-bench Machine Learning

Conversational Contextual Bandit: Algorithm and Application

no code implementations4 Jun 2019 Xiaoying Zhang, Hong Xie, Hang Li, John C. S. Lui

Here, a key-term can relate to a subset of arms, for example, a category of articles in news recommendation.

News Recommendation Recommendation Systems

Cannot find the paper you are looking for? You can Submit a new open access paper.