Search Results for author: Hongning Wang

Found 69 papers, 23 papers with code

ChatGLM-RLHF: Practices of Aligning Large Language Models with Human Feedback

no code implementations • 1 Apr 2024 • Zhenyu Hou, Yilin Niu, Zhengxiao Du, Xiaohan Zhang, Xiao Liu, Aohan Zeng, Qinkai Zheng, Minlie Huang, Hongning Wang, Jie Tang, Yuxiao Dong

The work presents our practices of aligning LLMs with human preferences, offering insights into the challenges and solutions in RLHF implementations.

Paper
Add Code

Overcoming Reward Overoptimization via Adversarial Policy Optimization with Lightweight Uncertainty Estimation

no code implementations • 8 Mar 2024 • Xiaoying Zhang, Jean-Francois Ton, Wei Shen, Hongning Wang, Yang Liu

We introduce Adversarial Policy Optimization (AdvPO), a novel solution to the pervasive issue of reward over-optimization in Reinforcement Learning from Human Feedback (RLHF) for Large Language Models (LLMs).

Paper
Add Code

Federated Linear Contextual Bandits with Heterogeneous Clients

no code implementations • 29 Feb 2024 • Ethan Blaser, Chuanhao Li, Hongning Wang

The demand for collaborative and private bandit learning across multiple agents is surging due to the growing quantity of data generated from distributed systems.

Federated Learning Multi-Armed Bandits

Paper
Add Code

ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors

1 code implementation • 26 Feb 2024 • Zhexin Zhang, Yida Lu, Jingyuan Ma, Di Zhang, Rui Li, Pei Ke, Hao Sun, Lei Sha, Zhifang Sui, Hongning Wang, Minlie Huang

The safety of Large Language Models (LLMs) has gained increasing attention in recent years, but there still lacks a comprehensive approach for detecting safety issues within LLMs' responses in an aligned, customizable and explainable manner.

Paper
Code

Stealthy Adversarial Attacks on Stochastic Multi-Armed Bandits

no code implementations • 21 Feb 2024 • Zhiwei Wang, Huazheng Wang, Hongning Wang

Our analysis shows that against two popularly employed MAB algorithms, UCB1 and $\epsilon$-greedy, the success of a stealthy attack depends on the environmental conditions and the realized reward of the arm pulled in the first round.

Multi-Armed Bandits

Paper
Add Code

Incentivized Truthful Communication for Federated Bandits

no code implementations • 7 Feb 2024 • Zhepei Wei, Chuanhao Li, Tianze Ren, Haifeng Xu, Hongning Wang

To enhance the efficiency and practicality of federated bandit learning, recent advances have introduced incentives to motivate communication among clients, where a client participates only when the incentive offered by the server outweighs its participation cost.

Paper
Add Code

AMOR: A Recipe for Building Adaptable Modular Knowledge Agents Through Process Feedback

no code implementations • 2 Feb 2024 • Jian Guan, Wei Wu, Zujie Wen, Peng Xu, Hongning Wang, Minlie Huang

We present AMOR, an agent framework based on open-source LLMs, which reasons with external knowledge bases and adapts to specific domains through human supervision to the reasoning process.

Paper
Add Code

Towards Efficient and Exact Optimization of Language Model Alignment

1 code implementation • 1 Feb 2024 • Haozhe Ji, Cheng Lu, Yilin Niu, Pei Ke, Hongning Wang, Jun Zhu, Jie Tang, Minlie Huang

We prove that EXO is guaranteed to optimize in the same direction as the RL algorithms asymptotically for arbitary parametrization of the policy, while enables efficient optimization by circumventing the complexities associated with RL algorithms.

Language Modelling Reinforcement Learning (RL)

Paper
Code

The Impact of Snippet Reliability on Misinformation in Online Health Search

no code implementations • 28 Jan 2024 • Anat Hashavit, Tamar Stern, Hongning Wang, Sarit Kraus

These results strongly suggest that an information need-focused approach can significantly improve the reliability of extracted snippets in online health search.

Misinformation

Paper
Add Code

AlignBench: Benchmarking Chinese Alignment of Large Language Models

1 code implementation • 30 Nov 2023 • Xiao Liu, Xuanyu Lei, Shengyuan Wang, Yue Huang, Zhuoer Feng, Bosi Wen, Jiale Cheng, Pei Ke, Yifan Xu, Weng Lam Tam, Xiaohan Zhang, Lichao Sun, Hongning Wang, Jing Zhang, Minlie Huang, Yuxiao Dong, Jie Tang

We will provide public APIs for evaluating AlignBench with CritiqueLLM to facilitate the evaluation of LLMs' Chinese alignment.

Benchmarking

190

Paper
Code

CritiqueLLM: Scaling LLM-as-Critic for Effective and Explainable Evaluation of Large Language Model Generation

2 code implementations • 30 Nov 2023 • Pei Ke, Bosi Wen, Zhuoer Feng, Xiao Liu, Xuanyu Lei, Jiale Cheng, Shengyuan Wang, Aohan Zeng, Yuxiao Dong, Hongning Wang, Jie Tang, Minlie Huang

Since the natural language processing (NLP) community started to make large language models (LLMs), such as GPT-4, act as a critic to evaluate the quality of generated texts, most of them only train a critique generation model of a specific scale on specific datasets.

Language Modelling Large Language Model

190

Paper
Code

Black-Box Prompt Optimization: Aligning Large Language Models without Model Training

1 code implementation • 7 Nov 2023 • Jiale Cheng, Xiao Liu, Kehan Zheng, Pei Ke, Hongning Wang, Yuxiao Dong, Jie Tang, Minlie Huang

However, these models are often not well aligned with human intents, which calls for additional treatments on them, that is, the alignment problem.

248

Paper
Code

Language Model Decoding as Direct Metrics Optimization

no code implementations • 2 Oct 2023 • Haozhe Ji, Pei Ke, Hongning Wang, Minlie Huang

And most importantly, we prove that this induced distribution is guaranteed to improve the perplexity on human texts, which suggests a better approximation to the underlying distribution of human texts.

Language Modelling

Paper
Add Code

Meta-Reinforcement Learning via Exploratory Task Clustering

no code implementations • 15 Feb 2023 • Zhendong Chu, Hongning Wang

In this paper, we explore the structured heterogeneity among tasks via clustering to improve meta-RL.

Clustering Meta Reinforcement Learning +2

Paper
Add Code

Debiasing Recommendation by Learning Identifiable Latent Confounders

1 code implementation • 10 Feb 2023 • Qing Zhang, Xiaoying Zhang, Yang Liu, Hongning Wang, Min Gao, Jiheng Zhang, Ruocheng Guo

Confounding bias arises due to the presence of unmeasured variables (e. g., the socio-economic status of a user) that can affect both a user's exposure and feedback.

Causal Inference counterfactual +1

Paper
Code

How Bad is Top-$K$ Recommendation under Competing Content Creators?

no code implementations • 3 Feb 2023 • Fan Yao, Chuanhao Li, Denis Nekipelov, Hongning Wang, Haifeng Xu

Content creators compete for exposure on recommendation platforms, and such strategic behavior leads to a dynamic shift over the content distribution.

Paper
Add Code

Disentangled Representation for Diversified Recommendations

1 code implementation • 13 Jan 2023 • Xiaoying Zhang, Hongning Wang, Hang Li

This calls for a fine-grained understanding of a user's preferences over items, where one needs to recognize the user's choice is driven by the quality of the item itself, or the pre-selected attributes of the item.

Paper
Code

MiddleGAN: Generate Domain Agnostic Samples for Unsupervised Domain Adaptation

no code implementations • 6 Nov 2022 • Ye Gao, Zhendong Chu, Hongning Wang, John Stankovic

We extend the theory of GAN to show that there exist optimal solutions for the parameters of the two discriminators and one generator in MiddleGAN, and empirically show that the samples generated by the MiddleGAN are similar to both samples from the source domain and samples from the target domain.

Unsupervised Domain Adaptation

Paper
Add Code

COFFEE: Counterfactual Fairness for Personalized Text Generation in Explainable Recommendation

no code implementations • 14 Oct 2022 • Nan Wang, Qifan Wang, Yi-Chia Wang, Maziar Sanjabi, Jingzhou Liu, Hamed Firooz, Hongning Wang, Shaoliang Nie

However, the bias inherent in user written text, often used for PTG model training, can inadvertently associate different levels of linguistic quality with users' protected attributes.

counterfactual Counterfactual Inference +4

Paper
Add Code

Spectral Augmentation for Self-Supervised Learning on Graphs

1 code implementation • 2 Oct 2022 • Lu Lin, Jinghui Chen, Hongning Wang

Graph contrastive learning (GCL), as an emerging self-supervised learning technique on graphs, aims to learn representations via instance discrimination.

Contrastive Learning Node Classification +3

Paper
Code

Rethinking Conversational Recommendations: Is Decision Tree All You Need?

1 code implementation • 31 Aug 2022 • A S M Ahsan-Ul Haque, Hongning Wang

Fourthly, when the user rejects a recommendation, we adaptively choose the next decision tree to improve subsequent questions and recommendations.

Recommendation Systems reinforcement-learning +1

Paper
Code

Dynamic Global Sensitivity for Differentially Private Contextual Bandits

no code implementations • 30 Aug 2022 • Huazheng Wang, David Zhao, Hongning Wang

We provide a rigorous theoretical analysis over the amount of noise added via dynamic global sensitivity and the corresponding upper regret bound of our proposed algorithm.

Multi-Armed Bandits

Paper
Add Code

Not Just Skipping. Understanding the Effect of Sponsored Content on Users' Decision-Making in Online Health Search

no code implementations • 10 Jul 2022 • Anat Hashavit, Hongning Wang, Tamar Stern, Sarit Kraus

We further discover that the contrast between the indirect marketing ads and the viewpoint presented in the organic search results plays an important role in users' decision-making.

Decision Making Marketing

Paper
Add Code

Scalable Exploration for Neural Online Learning to Rank with Perturbed Feedback

no code implementations • 13 Jun 2022 • Yiling Jia, Hongning Wang

Deep neural networks (DNNs) demonstrate significant advantages in improving ranking performance in retrieval tasks.

Computational Efficiency Efficient Exploration +2

Paper
Add Code

Communication Efficient Distributed Learning for Kernelized Contextual Bandits

no code implementations • 10 Jun 2022 • Chuanhao Li, Huazheng Wang, Mengdi Wang, Hongning Wang

We tackle the communication efficiency challenge of learning kernelized contextual bandits in a distributed setting.

Multi-Armed Bandits

Paper
Add Code

Meta Policy Learning for Cold-Start Conversational Recommendation

1 code implementation • 24 May 2022 • Zhendong Chu, Hongning Wang, Yun Xiao, Bo Long, Lingfei Wu

We propose to learn a meta policy and adapt it to new users with only a few trials of conversational recommendations.

Meta Reinforcement Learning Recommendation Systems +2

Paper
Code

Graph-based Extractive Explainer for Recommendations

no code implementations • 20 Feb 2022 • Peng Wang, Renqin Cai, Hongning Wang

Explanations in a recommender system assist users in making informed decisions among a set of recommended items.

Attribute Recommendation Systems +1

Paper
Add Code

Learning from a Learning User for Optimal Recommendations

no code implementations • 3 Feb 2022 • Fan Yao, Chuanhao Li, Denis Nekipelov, Hongning Wang, Haifeng Xu

In real-world recommendation problems, especially those with a formidably large item space, users have to gradually learn to estimate the utility of any fresh recommendations from their experience about previously consumed items.

Paper
Add Code

Communication Efficient Federated Learning for Generalized Linear Bandits

no code implementations • 2 Feb 2022 • Chuanhao Li, Hongning Wang

Contextual bandit algorithms have been recently studied under the federated learning setting to satisfy the demand of keeping data decentralized and pushing the learning of bandit models to the client side.

Federated Learning regression

Paper
Add Code

E-ADDA: Unsupervised Adversarial Domain Adaptation Enhanced by a New Mahalanobis Distance Loss for Smart Computing

no code implementations • 24 Jan 2022 • Ye Gao, Brian Baucom, Karen Rose, Kristina Gordon, Hongning Wang, John Stankovic

In the computer vision modality, the evaluation results suggest that we achieve new state-of-the-art performance on popular UDA benchmarks such as Office-31 and Office-Home, outperforming the second best-performing algorithms by up to 17. 9%.

Out-of-Distribution Detection Unsupervised Domain Adaptation

Paper
Add Code

IMO$^3$: Interactive Multi-Objective Off-Policy Optimization

no code implementations • 24 Jan 2022 • Nan Wang, Hongning Wang, Maryam Karimzadehgan, Branislav Kveton, Craig Boutilier

This problem has been studied extensively in the setting of known objective functions.

Paper
Add Code

Learning Neural Contextual Bandits Through Perturbed Rewards

no code implementations • ICLR 2022 • Yiling Jia, Weitong Zhang, Dongruo Zhou, Quanquan Gu, Hongning Wang

Thanks to the power of representation learning, neural contextual bandit algorithms demonstrate remarkable performance improvement against their classical counterparts.

Computational Efficiency Multi-Armed Bandits +1

Paper
Add Code

Learning Neural Ranking Models Online from Implicit User Feedback

no code implementations • 17 Jan 2022 • Yiling Jia, Hongning Wang

Existing online learning to rank (OL2R) solutions are limited to linear models, which are incompetent to capture possible non-linear relations between queries and documents.

Learning-To-Rank Representation Learning

Paper
Add Code

Calibrating Explore-Exploit Trade-off for Fair Online Learning to Rank

no code implementations • 1 Nov 2021 • Yiling Jia, Hongning Wang

Online learning to rank (OL2R) has attracted great research interests in recent years, thanks to its advantages in avoiding expensive relevance labeling as required in offline supervised ranking model learning.

Fairness Learning-To-Rank

Paper
Add Code

Graph Structural Attack by Perturbing Spectral Distance

1 code implementation • 1 Nov 2021 • Lu Lin, Ethan Blaser, Hongning Wang

Graph Convolutional Networks (GCNs) have fueled a surge of research interest due to their encouraging performance on graph learning tasks, but they are also shown vulnerability to adversarial attacks.

Graph Learning

Paper
Code

Comparative Explanations of Recommendations

no code implementations • 1 Nov 2021 • Aobo Yang, Nan Wang, Renqin Cai, Hongbo Deng, Hongning Wang

As recommendation is essentially a comparative (or ranking) process, a good explanation should illustrate to users why an item is believed to be better than another, i. e., comparative explanations about the recommended items.

Explainable Recommendation Recommendation Systems +1

Paper
Add Code

Graph Embedding with Hierarchical Attentive Membership

no code implementations • 31 Oct 2021 • Lu Lin, Ethan Blaser, Hongning Wang

The exploitation of graph structures is the key to effectively learning representations of nodes that preserve useful information in graphs.

Graph Embedding Link Prediction +1

Paper
Add Code

Unbiased Graph Embedding with Biased Graph Observations

no code implementations • 26 Oct 2021 • Nan Wang, Lu Lin, Jundong Li, Hongning Wang

In this paper, we propose a principled new way for unbiased graph embedding by learning node embeddings from an underlying bias-free graph, which is not influenced by sensitive node attributes.

Fairness Graph Embedding

Paper
Add Code

When Are Linear Stochastic Bandits Attackable?

no code implementations • 18 Oct 2021 • Huazheng Wang, Haifeng Xu, Hongning Wang

We study adversarial attacks on linear stochastic bandits: by manipulating the rewards, an adversary aims to control the behaviour of the bandit algorithm.

Decision Making Recommendation Systems

Paper
Add Code

Learning the Optimal Recommendation from Explorative Users

no code implementations • 6 Oct 2021 • Fan Yao, Chuanhao Li, Denis Nekipelov, Hongning Wang, Haifeng Xu

We propose a new problem setting to study the sequential interactions between a recommender system and a user.

Recommendation Systems

Paper
Add Code

Asynchronous Upper Confidence Bound Algorithms for Federated Linear Bandits

no code implementations • 4 Oct 2021 • Chuanhao Li, Hongning Wang

In this paper, we study linear contextual bandit in a federated learning setting.

Federated Learning

Paper
Add Code

Improve Learning from Crowds via Generative Augmentation

no code implementations • 22 Jul 2021 • Zhendong Chu, Hongning Wang

This creates a sparsity issue and limits the quality of machine learning models trained on such data.

BIG-bench Machine Learning Data Augmentation

Paper
Add Code

When and Whom to Collaborate with in a Changing Environment: A Collaborative Dynamic Bandit Solution

no code implementations • 14 Apr 2021 • Chuanhao Li, Qingyun Wu, Hongning Wang

However, all existing collaborative bandit learning solutions impose a stationary assumption about the environment, i. e., both user preferences and the dependency among users are assumed static over time.

Bayesian Inference Collaborative Filtering +3

Paper
Add Code

Incentivizing Exploration in Linear Bandits under Information Gap

no code implementations • 8 Apr 2021 • Huazheng Wang, Haifeng Xu, Chuanhao Li, Zhiyuan Liu, Hongning Wang

We study the problem of incentivizing exploration for myopic users in linear bandits, where the users tend to exploit arm with the highest predicted reward instead of exploring.

Paper
Add Code

PairRank: Online Pairwise Learning to Rank by Divide-and-Conquer

1 code implementation • 28 Feb 2021 • Yiling Jia, Huazheng Wang, Stephen Guo, Hongning Wang

Online Learning to Rank (OL2R) eliminates the need of explicit relevance annotation by directly optimizing the rankers from their interactions with users.

Learning-To-Rank

Paper
Code

Reversible Action Design for Combinatorial Optimization with Reinforcement Learning

no code implementations • 14 Feb 2021 • Fan Yao, Renqin Cai, Hongning Wang

Combinatorial optimization problem (COP) over graphs is a fundamental challenge in optimization.

Combinatorial Optimization Q-Learning +3

Paper
Add Code

Explanation as a Defense of Recommendation

no code implementations • 24 Jan 2021 • Aobo Yang, Nan Wang, Hongbo Deng, Hongning Wang

At training time, the two learning tasks are joined by a latent sentiment vector, which is encoded by the recommendation module and used to make word choices for explanation generation.

Explanation Generation

Paper
Add Code

Learning from Crowds by Modeling Common Confusions

2 code implementations • 24 Dec 2020 • Zhendong Chu, Jing Ma, Hongning Wang

Crowdsourcing provides a practical way to obtain large amounts of labeled data at a low cost.

Ranked #1 on Image Classification on LabelMe

Image Classification

Paper
Code

Unifying Clustered and Non-stationary Bandits

no code implementations • 5 Sep 2020 • Chuanhao Li, Qingyun Wu, Hongning Wang

Non-stationary bandits and online clustering of bandits lift the restrictive assumptions in contextual bandits and provide solutions to many important real-world scenarios.

Change Detection Clustering +2

Paper
Add Code

Directional Multivariate Ranking

no code implementations • 9 Jun 2020 • Nan Wang, Hongning Wang

In this work, we propose a directional multi-aspect ranking criterion to enable a holistic ranking of items with respect to multiple aspects.

Paper
Add Code

Non-Clicks Mean Irrelevant? Propensity Ratio Scoring As a Correction

no code implementations • 18 May 2020 • Nan Wang, Zhen Qin, Xuanhui Wang, Hongning Wang

Recent advances in unbiased learning to rank (LTR) count on Inverse Propensity Scoring (IPS) to eliminate bias in implicit feedback.

Learning-To-Rank

Paper
Add Code

Déjà vu: A Contextualized Temporal Attention Mechanism for Sequential Recommendation

no code implementations • 29 Jan 2020 • Jibang Wu, Renqin Cai, Hongning Wang

Predicting users' preferences based on their sequential behaviors in history is challenging and crucial for modern recommender systems.

Sequential Recommendation

Paper
Add Code

A Model-Based Reinforcement Learning with Adversarial Training for Online Recommendation

1 code implementation • NeurIPS 2019 • Xueying Bai, Jian Guan, Hongning Wang

Reinforcement learning is effective in optimizing policies for recommender systems.

Generative Adversarial Network Model-based Reinforcement Learning +3

Paper
Code

JNET: Learning User Representations via Joint Network Embedding and Topic Embedding

1 code implementation • 1 Dec 2019 • Lin Gong, Lu Lin, Weihao Song, Hongning Wang

Inspired by the concept of user schema in social psychology, we take a new perspective to perform user representation learning by constructing a shared latent space to capture the dependency among different modalities of user-generated data.

Link Prediction Network Embedding

Paper
Code

Model-Based Reinforcement Learning with Adversarial Training for Online Recommendation

3 code implementations • NeurIPS 2019 • Xueying Bai, Jian Guan, Hongning Wang

Reinforcement learning is well suited for optimizing policies of recommender systems.

Generative Adversarial Network Model-based Reinforcement Learning +3

Paper
Code

BPMR: Bayesian Probabilistic Multivariate Ranking

no code implementations • 18 Sep 2019 • Nan Wang, Hongning Wang

The framework naturally leads to a probabilistic multi-aspect ranking criterion, which generalizes the single-aspect ranking to a multivariate fashion.

Recommendation Systems

Paper
Add Code

Active Collaborative Sensing for Energy Breakdown

1 code implementation • 2 Sep 2019 • Yiling Jia, Nipun Batra, Hongning Wang, Kamin Whitehouse

However, very few homes in the world have installed sub-meters (sensors measuring individual appliance energy); and the cost of retrofitting a home with extensive sub-metering eats into the funds available for energy saving retrofits.

Active Learning Total Energy

Paper
Code

Adversarial Domain Adaptation for Machine Reading Comprehension

no code implementations • IJCNLP 2019 • Huazheng Wang, Zhe Gan, Xiaodong Liu, Jingjing Liu, Jianfeng Gao, Hongning Wang

In this paper, we focus on unsupervised domain adaptation for Machine Reading Comprehension (MRC), where the source domain has a large amount of labeled data, while only unlabeled passages are available in the target domain.

Machine Reading Comprehension Representation Learning +1

Paper
Add Code

Variance Reduction in Gradient Exploration for Online Learning to Rank

no code implementations • 10 Jun 2019 • Huazheng Wang, Sonwoo Kim, Eric McCord-Snook, Qingyun Wu, Hongning Wang

We prove that the projected gradient is an unbiased estimation of the true gradient, and show that this lower-variance gradient estimation results in significant regret reduction.

Learning-To-Rank

Paper
Add Code

Factorization Bandits for Online Influence Maximization

1 code implementation • 9 Jun 2019 • Qingyun Wu, Zhige Li, Huazheng Wang, Wei Chen, Hongning Wang

We capitalize on an important property of the influence maximization problem named network assortativity, which is ignored by most existing works in online influence maximization.

Paper
Code

Context Attentive Document Ranking and Query Suggestion

5 code implementations • 5 Jun 2019 • Wasi Uddin Ahmad, Kai-Wei Chang, Hongning Wang

We present a context-aware neural ranking model to exploit users' on-task search activities and enhance retrieval performance.

Document Ranking Retrieval

116

Paper
Code

The FacT: Taming Latent Factor Models for Explainability with Factorization Trees

no code implementations • 3 Jun 2019 • Yiyi Tao, Yiling Jia, Nan Wang, Hongning Wang

In this work, we integrate regression trees to guide the learning of latent factor models for recommendation, and use the learnt tree structure to explain the resulting latent factors.

regression

Paper
Add Code

Bandit Learning with Implicit Feedback

1 code implementation • NeurIPS 2018 • Yi Qi, Qingyun Wu, Hongning Wang, Jie Tang, Maosong Sun

Implicit feedback, such as user clicks, although abundant in online information service systems, does not provide substantial evidence on users' evaluation of system's output.

Bayesian Inference Thompson Sampling

Paper
Code

Explainable Recommendation via Multi-Task Learning in Opinionated Text Data

1 code implementation • 10 Jun 2018 • Nan Wang, Hongning Wang, Yiling Jia, Yue Yin

Explaining automatically generated recommendations allows users to make more informed and accurate decisions about which results to utilize, and therefore improves their satisfaction.

Explainable Recommendation Multi-Task Learning

Paper
Code

Learning Contextual Bandits in a Non-stationary Environment

1 code implementation • 23 May 2018 • Qingyun Wu, Naveen Iyer, Hongning Wang

Multi-armed bandit algorithms have become a reference solution for handling the explore/exploit dilemma in recommender systems, and many other important real-world problems, such as display advertisement.

Multi-Armed Bandits Recommendation Systems