Search Results for author: Hongning Wang

Found 69 papers, 23 papers with code

ChatGLM-RLHF: Practices of Aligning Large Language Models with Human Feedback

no code implementations1 Apr 2024 Zhenyu Hou, Yilin Niu, Zhengxiao Du, Xiaohan Zhang, Xiao Liu, Aohan Zeng, Qinkai Zheng, Minlie Huang, Hongning Wang, Jie Tang, Yuxiao Dong

The work presents our practices of aligning LLMs with human preferences, offering insights into the challenges and solutions in RLHF implementations.

Overcoming Reward Overoptimization via Adversarial Policy Optimization with Lightweight Uncertainty Estimation

no code implementations8 Mar 2024 Xiaoying Zhang, Jean-Francois Ton, Wei Shen, Hongning Wang, Yang Liu

We introduce Adversarial Policy Optimization (AdvPO), a novel solution to the pervasive issue of reward over-optimization in Reinforcement Learning from Human Feedback (RLHF) for Large Language Models (LLMs).

Federated Linear Contextual Bandits with Heterogeneous Clients

no code implementations29 Feb 2024 Ethan Blaser, Chuanhao Li, Hongning Wang

The demand for collaborative and private bandit learning across multiple agents is surging due to the growing quantity of data generated from distributed systems.

Federated Learning Multi-Armed Bandits

ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors

1 code implementation26 Feb 2024 Zhexin Zhang, Yida Lu, Jingyuan Ma, Di Zhang, Rui Li, Pei Ke, Hao Sun, Lei Sha, Zhifang Sui, Hongning Wang, Minlie Huang

The safety of Large Language Models (LLMs) has gained increasing attention in recent years, but there still lacks a comprehensive approach for detecting safety issues within LLMs' responses in an aligned, customizable and explainable manner.

Stealthy Adversarial Attacks on Stochastic Multi-Armed Bandits

no code implementations21 Feb 2024 Zhiwei Wang, Huazheng Wang, Hongning Wang

Our analysis shows that against two popularly employed MAB algorithms, UCB1 and $\epsilon$-greedy, the success of a stealthy attack depends on the environmental conditions and the realized reward of the arm pulled in the first round.

Multi-Armed Bandits

Incentivized Truthful Communication for Federated Bandits

no code implementations7 Feb 2024 Zhepei Wei, Chuanhao Li, Tianze Ren, Haifeng Xu, Hongning Wang

To enhance the efficiency and practicality of federated bandit learning, recent advances have introduced incentives to motivate communication among clients, where a client participates only when the incentive offered by the server outweighs its participation cost.

AMOR: A Recipe for Building Adaptable Modular Knowledge Agents Through Process Feedback

no code implementations2 Feb 2024 Jian Guan, Wei Wu, Zujie Wen, Peng Xu, Hongning Wang, Minlie Huang

We present AMOR, an agent framework based on open-source LLMs, which reasons with external knowledge bases and adapts to specific domains through human supervision to the reasoning process.

Towards Efficient and Exact Optimization of Language Model Alignment

1 code implementation1 Feb 2024 Haozhe Ji, Cheng Lu, Yilin Niu, Pei Ke, Hongning Wang, Jun Zhu, Jie Tang, Minlie Huang

We prove that EXO is guaranteed to optimize in the same direction as the RL algorithms asymptotically for arbitary parametrization of the policy, while enables efficient optimization by circumventing the complexities associated with RL algorithms.

Language Modelling Reinforcement Learning (RL)

The Impact of Snippet Reliability on Misinformation in Online Health Search

no code implementations28 Jan 2024 Anat Hashavit, Tamar Stern, Hongning Wang, Sarit Kraus

These results strongly suggest that an information need-focused approach can significantly improve the reliability of extracted snippets in online health search.

Misinformation

CritiqueLLM: Scaling LLM-as-Critic for Effective and Explainable Evaluation of Large Language Model Generation

2 code implementations30 Nov 2023 Pei Ke, Bosi Wen, Zhuoer Feng, Xiao Liu, Xuanyu Lei, Jiale Cheng, Shengyuan Wang, Aohan Zeng, Yuxiao Dong, Hongning Wang, Jie Tang, Minlie Huang

Since the natural language processing (NLP) community started to make large language models (LLMs), such as GPT-4, act as a critic to evaluate the quality of generated texts, most of them only train a critique generation model of a specific scale on specific datasets.

Language Modelling Large Language Model

Black-Box Prompt Optimization: Aligning Large Language Models without Model Training

1 code implementation7 Nov 2023 Jiale Cheng, Xiao Liu, Kehan Zheng, Pei Ke, Hongning Wang, Yuxiao Dong, Jie Tang, Minlie Huang

However, these models are often not well aligned with human intents, which calls for additional treatments on them, that is, the alignment problem.

Language Model Decoding as Direct Metrics Optimization

no code implementations2 Oct 2023 Haozhe Ji, Pei Ke, Hongning Wang, Minlie Huang

And most importantly, we prove that this induced distribution is guaranteed to improve the perplexity on human texts, which suggests a better approximation to the underlying distribution of human texts.

Language Modelling

Meta-Reinforcement Learning via Exploratory Task Clustering

no code implementations15 Feb 2023 Zhendong Chu, Hongning Wang

In this paper, we explore the structured heterogeneity among tasks via clustering to improve meta-RL.

Clustering Meta Reinforcement Learning +2

Debiasing Recommendation by Learning Identifiable Latent Confounders

1 code implementation10 Feb 2023 Qing Zhang, Xiaoying Zhang, Yang Liu, Hongning Wang, Min Gao, Jiheng Zhang, Ruocheng Guo

Confounding bias arises due to the presence of unmeasured variables (e. g., the socio-economic status of a user) that can affect both a user's exposure and feedback.

Causal Inference counterfactual +1

How Bad is Top-$K$ Recommendation under Competing Content Creators?

no code implementations3 Feb 2023 Fan Yao, Chuanhao Li, Denis Nekipelov, Hongning Wang, Haifeng Xu

Content creators compete for exposure on recommendation platforms, and such strategic behavior leads to a dynamic shift over the content distribution.

Disentangled Representation for Diversified Recommendations

1 code implementation13 Jan 2023 Xiaoying Zhang, Hongning Wang, Hang Li

This calls for a fine-grained understanding of a user's preferences over items, where one needs to recognize the user's choice is driven by the quality of the item itself, or the pre-selected attributes of the item.

MiddleGAN: Generate Domain Agnostic Samples for Unsupervised Domain Adaptation

no code implementations6 Nov 2022 Ye Gao, Zhendong Chu, Hongning Wang, John Stankovic

We extend the theory of GAN to show that there exist optimal solutions for the parameters of the two discriminators and one generator in MiddleGAN, and empirically show that the samples generated by the MiddleGAN are similar to both samples from the source domain and samples from the target domain.

Unsupervised Domain Adaptation

COFFEE: Counterfactual Fairness for Personalized Text Generation in Explainable Recommendation

no code implementations14 Oct 2022 Nan Wang, Qifan Wang, Yi-Chia Wang, Maziar Sanjabi, Jingzhou Liu, Hamed Firooz, Hongning Wang, Shaoliang Nie

However, the bias inherent in user written text, often used for PTG model training, can inadvertently associate different levels of linguistic quality with users' protected attributes.

counterfactual Counterfactual Inference +4

Spectral Augmentation for Self-Supervised Learning on Graphs

1 code implementation2 Oct 2022 Lu Lin, Jinghui Chen, Hongning Wang

Graph contrastive learning (GCL), as an emerging self-supervised learning technique on graphs, aims to learn representations via instance discrimination.

Contrastive Learning Node Classification +3

Rethinking Conversational Recommendations: Is Decision Tree All You Need?

1 code implementation31 Aug 2022 A S M Ahsan-Ul Haque, Hongning Wang

Fourthly, when the user rejects a recommendation, we adaptively choose the next decision tree to improve subsequent questions and recommendations.

Recommendation Systems reinforcement-learning +1

Dynamic Global Sensitivity for Differentially Private Contextual Bandits

no code implementations30 Aug 2022 Huazheng Wang, David Zhao, Hongning Wang

We provide a rigorous theoretical analysis over the amount of noise added via dynamic global sensitivity and the corresponding upper regret bound of our proposed algorithm.

Multi-Armed Bandits

Not Just Skipping. Understanding the Effect of Sponsored Content on Users' Decision-Making in Online Health Search

no code implementations10 Jul 2022 Anat Hashavit, Hongning Wang, Tamar Stern, Sarit Kraus

We further discover that the contrast between the indirect marketing ads and the viewpoint presented in the organic search results plays an important role in users' decision-making.

Decision Making Marketing

Scalable Exploration for Neural Online Learning to Rank with Perturbed Feedback

no code implementations13 Jun 2022 Yiling Jia, Hongning Wang

Deep neural networks (DNNs) demonstrate significant advantages in improving ranking performance in retrieval tasks.

Computational Efficiency Efficient Exploration +2

Communication Efficient Distributed Learning for Kernelized Contextual Bandits

no code implementations10 Jun 2022 Chuanhao Li, Huazheng Wang, Mengdi Wang, Hongning Wang

We tackle the communication efficiency challenge of learning kernelized contextual bandits in a distributed setting.

Multi-Armed Bandits

Meta Policy Learning for Cold-Start Conversational Recommendation

1 code implementation24 May 2022 Zhendong Chu, Hongning Wang, Yun Xiao, Bo Long, Lingfei Wu

We propose to learn a meta policy and adapt it to new users with only a few trials of conversational recommendations.

Meta Reinforcement Learning Recommendation Systems +2

Graph-based Extractive Explainer for Recommendations

no code implementations20 Feb 2022 Peng Wang, Renqin Cai, Hongning Wang

Explanations in a recommender system assist users in making informed decisions among a set of recommended items.

Attribute Recommendation Systems +1

Learning from a Learning User for Optimal Recommendations

no code implementations3 Feb 2022 Fan Yao, Chuanhao Li, Denis Nekipelov, Hongning Wang, Haifeng Xu

In real-world recommendation problems, especially those with a formidably large item space, users have to gradually learn to estimate the utility of any fresh recommendations from their experience about previously consumed items.

Communication Efficient Federated Learning for Generalized Linear Bandits

no code implementations2 Feb 2022 Chuanhao Li, Hongning Wang

Contextual bandit algorithms have been recently studied under the federated learning setting to satisfy the demand of keeping data decentralized and pushing the learning of bandit models to the client side.

Federated Learning regression

E-ADDA: Unsupervised Adversarial Domain Adaptation Enhanced by a New Mahalanobis Distance Loss for Smart Computing

no code implementations24 Jan 2022 Ye Gao, Brian Baucom, Karen Rose, Kristina Gordon, Hongning Wang, John Stankovic

In the computer vision modality, the evaluation results suggest that we achieve new state-of-the-art performance on popular UDA benchmarks such as Office-31 and Office-Home, outperforming the second best-performing algorithms by up to 17. 9%.

Out-of-Distribution Detection Unsupervised Domain Adaptation

Learning Neural Contextual Bandits Through Perturbed Rewards

no code implementations ICLR 2022 Yiling Jia, Weitong Zhang, Dongruo Zhou, Quanquan Gu, Hongning Wang

Thanks to the power of representation learning, neural contextual bandit algorithms demonstrate remarkable performance improvement against their classical counterparts.

Computational Efficiency Multi-Armed Bandits +1

IMO$^3$: Interactive Multi-Objective Off-Policy Optimization

no code implementations24 Jan 2022 Nan Wang, Hongning Wang, Maryam Karimzadehgan, Branislav Kveton, Craig Boutilier

This problem has been studied extensively in the setting of known objective functions.

Learning Neural Ranking Models Online from Implicit User Feedback

no code implementations17 Jan 2022 Yiling Jia, Hongning Wang

Existing online learning to rank (OL2R) solutions are limited to linear models, which are incompetent to capture possible non-linear relations between queries and documents.

Learning-To-Rank Representation Learning

Calibrating Explore-Exploit Trade-off for Fair Online Learning to Rank

no code implementations1 Nov 2021 Yiling Jia, Hongning Wang

Online learning to rank (OL2R) has attracted great research interests in recent years, thanks to its advantages in avoiding expensive relevance labeling as required in offline supervised ranking model learning.

Fairness Learning-To-Rank

Comparative Explanations of Recommendations

no code implementations1 Nov 2021 Aobo Yang, Nan Wang, Renqin Cai, Hongbo Deng, Hongning Wang

As recommendation is essentially a comparative (or ranking) process, a good explanation should illustrate to users why an item is believed to be better than another, i. e., comparative explanations about the recommended items.

Explainable Recommendation Recommendation Systems +1

Graph Structural Attack by Perturbing Spectral Distance

1 code implementation1 Nov 2021 Lu Lin, Ethan Blaser, Hongning Wang

Graph Convolutional Networks (GCNs) have fueled a surge of research interest due to their encouraging performance on graph learning tasks, but they are also shown vulnerability to adversarial attacks.

Graph Learning

Graph Embedding with Hierarchical Attentive Membership

no code implementations31 Oct 2021 Lu Lin, Ethan Blaser, Hongning Wang

The exploitation of graph structures is the key to effectively learning representations of nodes that preserve useful information in graphs.

Graph Embedding Link Prediction +1

Unbiased Graph Embedding with Biased Graph Observations

no code implementations26 Oct 2021 Nan Wang, Lu Lin, Jundong Li, Hongning Wang

In this paper, we propose a principled new way for unbiased graph embedding by learning node embeddings from an underlying bias-free graph, which is not influenced by sensitive node attributes.

Fairness Graph Embedding

When Are Linear Stochastic Bandits Attackable?

no code implementations18 Oct 2021 Huazheng Wang, Haifeng Xu, Hongning Wang

We study adversarial attacks on linear stochastic bandits: by manipulating the rewards, an adversary aims to control the behaviour of the bandit algorithm.

Decision Making Recommendation Systems

Learning the Optimal Recommendation from Explorative Users

no code implementations6 Oct 2021 Fan Yao, Chuanhao Li, Denis Nekipelov, Hongning Wang, Haifeng Xu

We propose a new problem setting to study the sequential interactions between a recommender system and a user.

Recommendation Systems

Improve Learning from Crowds via Generative Augmentation

no code implementations22 Jul 2021 Zhendong Chu, Hongning Wang

This creates a sparsity issue and limits the quality of machine learning models trained on such data.

BIG-bench Machine Learning Data Augmentation

When and Whom to Collaborate with in a Changing Environment: A Collaborative Dynamic Bandit Solution

no code implementations14 Apr 2021 Chuanhao Li, Qingyun Wu, Hongning Wang

However, all existing collaborative bandit learning solutions impose a stationary assumption about the environment, i. e., both user preferences and the dependency among users are assumed static over time.

Bayesian Inference Collaborative Filtering +3

Incentivizing Exploration in Linear Bandits under Information Gap

no code implementations8 Apr 2021 Huazheng Wang, Haifeng Xu, Chuanhao Li, Zhiyuan Liu, Hongning Wang

We study the problem of incentivizing exploration for myopic users in linear bandits, where the users tend to exploit arm with the highest predicted reward instead of exploring.

PairRank: Online Pairwise Learning to Rank by Divide-and-Conquer

1 code implementation28 Feb 2021 Yiling Jia, Huazheng Wang, Stephen Guo, Hongning Wang

Online Learning to Rank (OL2R) eliminates the need of explicit relevance annotation by directly optimizing the rankers from their interactions with users.

Learning-To-Rank

Explanation as a Defense of Recommendation

no code implementations24 Jan 2021 Aobo Yang, Nan Wang, Hongbo Deng, Hongning Wang

At training time, the two learning tasks are joined by a latent sentiment vector, which is encoded by the recommendation module and used to make word choices for explanation generation.

Explanation Generation

Learning from Crowds by Modeling Common Confusions

2 code implementations24 Dec 2020 Zhendong Chu, Jing Ma, Hongning Wang

Crowdsourcing provides a practical way to obtain large amounts of labeled data at a low cost.

Image Classification

Unifying Clustered and Non-stationary Bandits

no code implementations5 Sep 2020 Chuanhao Li, Qingyun Wu, Hongning Wang

Non-stationary bandits and online clustering of bandits lift the restrictive assumptions in contextual bandits and provide solutions to many important real-world scenarios.

Change Detection Clustering +2

Directional Multivariate Ranking

no code implementations9 Jun 2020 Nan Wang, Hongning Wang

In this work, we propose a directional multi-aspect ranking criterion to enable a holistic ranking of items with respect to multiple aspects.

Non-Clicks Mean Irrelevant? Propensity Ratio Scoring As a Correction

no code implementations18 May 2020 Nan Wang, Zhen Qin, Xuanhui Wang, Hongning Wang

Recent advances in unbiased learning to rank (LTR) count on Inverse Propensity Scoring (IPS) to eliminate bias in implicit feedback.

Learning-To-Rank

Déjà vu: A Contextualized Temporal Attention Mechanism for Sequential Recommendation

no code implementations29 Jan 2020 Jibang Wu, Renqin Cai, Hongning Wang

Predicting users' preferences based on their sequential behaviors in history is challenging and crucial for modern recommender systems.

Sequential Recommendation

JNET: Learning User Representations via Joint Network Embedding and Topic Embedding

1 code implementation1 Dec 2019 Lin Gong, Lu Lin, Weihao Song, Hongning Wang

Inspired by the concept of user schema in social psychology, we take a new perspective to perform user representation learning by constructing a shared latent space to capture the dependency among different modalities of user-generated data.

Link Prediction Network Embedding

BPMR: Bayesian Probabilistic Multivariate Ranking

no code implementations18 Sep 2019 Nan Wang, Hongning Wang

The framework naturally leads to a probabilistic multi-aspect ranking criterion, which generalizes the single-aspect ranking to a multivariate fashion.

Recommendation Systems

Active Collaborative Sensing for Energy Breakdown

1 code implementation2 Sep 2019 Yiling Jia, Nipun Batra, Hongning Wang, Kamin Whitehouse

However, very few homes in the world have installed sub-meters (sensors measuring individual appliance energy); and the cost of retrofitting a home with extensive sub-metering eats into the funds available for energy saving retrofits.

Active Learning Total Energy

Adversarial Domain Adaptation for Machine Reading Comprehension

no code implementations IJCNLP 2019 Huazheng Wang, Zhe Gan, Xiaodong Liu, Jingjing Liu, Jianfeng Gao, Hongning Wang

In this paper, we focus on unsupervised domain adaptation for Machine Reading Comprehension (MRC), where the source domain has a large amount of labeled data, while only unlabeled passages are available in the target domain.

Machine Reading Comprehension Representation Learning +1

Variance Reduction in Gradient Exploration for Online Learning to Rank

no code implementations10 Jun 2019 Huazheng Wang, Sonwoo Kim, Eric McCord-Snook, Qingyun Wu, Hongning Wang

We prove that the projected gradient is an unbiased estimation of the true gradient, and show that this lower-variance gradient estimation results in significant regret reduction.

Learning-To-Rank

Factorization Bandits for Online Influence Maximization

1 code implementation9 Jun 2019 Qingyun Wu, Zhige Li, Huazheng Wang, Wei Chen, Hongning Wang

We capitalize on an important property of the influence maximization problem named network assortativity, which is ignored by most existing works in online influence maximization.

Context Attentive Document Ranking and Query Suggestion

5 code implementations5 Jun 2019 Wasi Uddin Ahmad, Kai-Wei Chang, Hongning Wang

We present a context-aware neural ranking model to exploit users' on-task search activities and enhance retrieval performance.

Document Ranking Retrieval

The FacT: Taming Latent Factor Models for Explainability with Factorization Trees

no code implementations3 Jun 2019 Yiyi Tao, Yiling Jia, Nan Wang, Hongning Wang

In this work, we integrate regression trees to guide the learning of latent factor models for recommendation, and use the learnt tree structure to explain the resulting latent factors.

regression

Bandit Learning with Implicit Feedback

1 code implementation NeurIPS 2018 Yi Qi, Qingyun Wu, Hongning Wang, Jie Tang, Maosong Sun

Implicit feedback, such as user clicks, although abundant in online information service systems, does not provide substantial evidence on users' evaluation of system's output.

Bayesian Inference Thompson Sampling

Explainable Recommendation via Multi-Task Learning in Opinionated Text Data

1 code implementation10 Jun 2018 Nan Wang, Hongning Wang, Yiling Jia, Yue Yin

Explaining automatically generated recommendations allows users to make more informed and accurate decisions about which results to utilize, and therefore improves their satisfaction.

Explainable Recommendation Multi-Task Learning

Learning Contextual Bandits in a Non-stationary Environment

1 code implementation23 May 2018 Qingyun Wu, Naveen Iyer, Hongning Wang

Multi-armed bandit algorithms have become a reference solution for handling the explore/exploit dilemma in recommender systems, and many other important real-world problems, such as display advertisement.

Multi-Armed Bandits Recommendation Systems

Multi-Task Learning for Document Ranking and Query Suggestion

1 code implementation ICLR 2018 Wasi Uddin Ahmad, Kai-Wei Chang, Hongning Wang

We propose a multi-task learning framework to jointly learn document ranking and query suggestion for web search.

Document Ranking Multi-Task Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.