Search Results for author: Huazheng Wang

Found 30 papers, 4 papers with code

Embodied LLM Agents Learn to Cooperate in Organized Teams

no code implementations19 Mar 2024 Xudong Guo, Kaixuan Huang, Jiale Liu, Wenhui Fan, Natalia Vélez, Qingyun Wu, Huazheng Wang, Thomas L. Griffiths, Mengdi Wang

Large Language Models (LLMs) have emerged as integral tools for reasoning, planning, and decision-making, drawing upon their extensive world knowledge and proficiency in language-related tasks.

Decision Making World Knowledge

AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks

no code implementations2 Mar 2024 Yifan Zeng, Yiran Wu, Xiao Zhang, Huazheng Wang, Qingyun Wu

Through conducting extensive experiments on a large scale of harmful and safe prompts, we validate the effectiveness of the proposed AutoDefense in improving the robustness against jailbreak attacks, while maintaining the performance at normal user request.

Instruction Following

Stealthy Adversarial Attacks on Stochastic Multi-Armed Bandits

no code implementations21 Feb 2024 Zhiwei Wang, Huazheng Wang, Hongning Wang

Our analysis shows that against two popularly employed MAB algorithms, UCB1 and $\epsilon$-greedy, the success of a stealthy attack depends on the environmental conditions and the realized reward of the arm pulled in the first round.

Multi-Armed Bandits

Tree Search-Based Evolutionary Bandits for Protein Sequence Optimization

no code implementations8 Jan 2024 Jiahao Qiu, Hui Yuan, Jinghong Zhang, Wentao Chen, Huazheng Wang, Mengdi Wang

To enhance the efficiency of such a process, we propose a tree search-based bandit learning method, which expands a tree starting from the initial sequence with the guidance of a bandit machine learning model.

Pure Exploration in Asynchronous Federated Bandits

no code implementations17 Oct 2023 Zichen Wang, Chuanhao Li, Chenyu Song, Lianghui Wang, Quanquan Gu, Huazheng Wang

We study the federated pure exploration problem of multi-armed bandits and linear bandits, where $M$ agents cooperatively identify the best arm via communicating with the central server.

Multi-Armed Bandits

Adversarial Attacks on Combinatorial Multi-Armed Bandits

no code implementations8 Oct 2023 Rishab Balasubramanian, Jiawei Li, Prasad Tadepalli, Huazheng Wang, Qingyun Wu, Haoyu Zhao

Contrary to prior understanding of multi-armed bandits, our work reveals a surprising fact that the attackability of a specific CMAB instance also depends on whether the bandit instance is known or unknown to the adversary.

Multi-Armed Bandits

PARL: A Unified Framework for Policy Alignment in Reinforcement Learning

no code implementations3 Aug 2023 Souradip Chakraborty, Amrit Singh Bedi, Alec Koppel, Dinesh Manocha, Huazheng Wang, Mengdi Wang, Furong Huang

We present a novel unified bilevel optimization-based framework, \textsf{PARL}, formulated to address the recently highlighted critical issue of policy alignment in reinforcement learning using utility or preference-based feedback.

Bilevel Optimization Procedure Learning +2

Online Modeling and Monitoring of Dependent Processes under Resource Constraints

no code implementations26 Jul 2023 Tanapol Kosolwattana, Huazheng Wang, Ying Lin

Adaptive monitoring of a large population of dynamic processes is critical for the timely detection of abnormal events under limited resources in many healthcare and engineering systems.

Provably Efficient Representation Learning with Tractable Planning in Low-Rank POMDP

no code implementations21 Jun 2023 Jiacheng Guo, Zihao Li, Huazheng Wang, Mengdi Wang, Zhuoran Yang, Xuezhou Zhang

In this paper, we study representation learning in partially observable Markov Decision Processes (POMDPs), where the agent learns a decoder function that maps a series of high-dimensional raw observations to a compact representation and uses it for more efficient exploration and planning.

Efficient Exploration Representation Learning

Unified Off-Policy Learning to Rank: a Reinforcement Learning Perspective

2 code implementations NeurIPS 2023 Zeyu Zhang, Yi Su, Hui Yuan, Yiran Wu, Rishab Balasubramanian, Qingyun Wu, Huazheng Wang, Mengdi Wang

Building upon this, we leverage offline RL techniques for off-policy LTR and propose the Click Model-Agnostic Unified Off-policy Learning to Rank (CUOLR) method, which could be easily applied to a wide range of click models.

Learning-To-Rank Offline RL +2

Machine Learning for Synthetic Data Generation: A Review

no code implementations8 Feb 2023 Yingzhou Lu, Minjie Shen, Huazheng Wang, Xiao Wang, Capucine van Rechem, Wenqi Wei

In light of these challenges, the concept of synthetic data generation emerges as a promising alternative that allows for data sharing and utilization in ways that real-world data cannot facilitate.

Fairness Synthetic Data Generation

Dynamic Global Sensitivity for Differentially Private Contextual Bandits

no code implementations30 Aug 2022 Huazheng Wang, David Zhao, Hongning Wang

We provide a rigorous theoretical analysis over the amount of noise added via dynamic global sensitivity and the corresponding upper regret bound of our proposed algorithm.

Multi-Armed Bandits

Provably Efficient Reinforcement Learning for Online Adaptive Influence Maximization

no code implementations29 Jun 2022 Kaixuan Huang, Yu Wu, Xuezhou Zhang, Shenyinying Tu, Qingyun Wu, Mengdi Wang, Huazheng Wang

Online influence maximization aims to maximize the influence spread of a content in a social network with unknown network model by selecting a few seed nodes.

Model-based Reinforcement Learning reinforcement-learning +1

Communication Efficient Distributed Learning for Kernelized Contextual Bandits

no code implementations10 Jun 2022 Chuanhao Li, Huazheng Wang, Mengdi Wang, Hongning Wang

We tackle the communication efficiency challenge of learning kernelized contextual bandits in a distributed setting.

Multi-Armed Bandits

Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization

no code implementations5 Jun 2022 Hui Yuan, Chengzhuo Ni, Huazheng Wang, Xuezhou Zhang, Le Cong, Csaba Szepesvári, Mengdi Wang

We propose a Thompson Sampling-guided Directed Evolution (TS-DE) framework for sequence optimization, where the sequence-to-function mapping is unknown and querying a single value is subject to costly and noisy measurements.

BIG-bench Machine Learning Evolutionary Algorithms +2

When Are Linear Stochastic Bandits Attackable?

no code implementations18 Oct 2021 Huazheng Wang, Haifeng Xu, Hongning Wang

We study adversarial attacks on linear stochastic bandits: by manipulating the rewards, an adversary aims to control the behaviour of the bandit algorithm.

Decision Making Recommendation Systems

Incentivizing Exploration in Linear Bandits under Information Gap

no code implementations8 Apr 2021 Huazheng Wang, Haifeng Xu, Chuanhao Li, Zhiyuan Liu, Hongning Wang

We study the problem of incentivizing exploration for myopic users in linear bandits, where the users tend to exploit arm with the highest predicted reward instead of exploring.

PairRank: Online Pairwise Learning to Rank by Divide-and-Conquer

1 code implementation28 Feb 2021 Yiling Jia, Huazheng Wang, Stephen Guo, Hongning Wang

Online Learning to Rank (OL2R) eliminates the need of explicit relevance annotation by directly optimizing the rankers from their interactions with users.

Learning-To-Rank

A Smoothed Analysis of Online Lasso for the Sparse Linear Contextual Bandit Problem

no code implementations16 Jul 2020 Zhiyuan Liu, Huazheng Wang, Bo Waggoner, Youjian, Liu, Lijun Chen

We investigate the sparse linear contextual bandit problem where the parameter $\theta$ is sparse.

Unbiased Learning to Rank: Online or Offline?

no code implementations28 Apr 2020 Qingyao Ai, Tao Yang, Huazheng Wang, Jiaxin Mao

While their definitions of \textit{unbiasness} are different, these two types of ULTR algorithms share the same goal -- to find the best models that rank documents based on their intrinsic relevance or utility.

Learning-To-Rank

Incentivized Exploration for Multi-Armed Bandits under Reward Drift

no code implementations12 Nov 2019 Zhiyuan Liu, Huazheng Wang, Fan Shen, Kai Liu, Lijun Chen

We study incentivized exploration for the multi-armed bandit (MAB) problem where the players receive compensation for exploring arms other than the greedy choice and may provide biased feedback on reward.

Multi-Armed Bandits Thompson Sampling

Adversarial Domain Adaptation for Machine Reading Comprehension

no code implementations IJCNLP 2019 Huazheng Wang, Zhe Gan, Xiaodong Liu, Jingjing Liu, Jianfeng Gao, Hongning Wang

In this paper, we focus on unsupervised domain adaptation for Machine Reading Comprehension (MRC), where the source domain has a large amount of labeled data, while only unlabeled passages are available in the target domain.

Machine Reading Comprehension Representation Learning +1

Variance Reduction in Gradient Exploration for Online Learning to Rank

no code implementations10 Jun 2019 Huazheng Wang, Sonwoo Kim, Eric McCord-Snook, Qingyun Wu, Hongning Wang

We prove that the projected gradient is an unbiased estimation of the true gradient, and show that this lower-variance gradient estimation results in significant regret reduction.

Learning-To-Rank

Factorization Bandits for Online Influence Maximization

1 code implementation9 Jun 2019 Qingyun Wu, Zhige Li, Huazheng Wang, Wei Chen, Hongning Wang

We capitalize on an important property of the influence maximization problem named network assortativity, which is ignored by most existing works in online influence maximization.

Solving Verbal Comprehension Questions in IQ Test by Knowledge-Powered Word Embedding

no code implementations29 May 2015 Huazheng Wang, Fei Tian, Bin Gao, Jiang Bian, Tie-Yan Liu

Second, we obtain distributed representations of words and relations by leveraging a novel word embedding method that considers the multi-sense nature of words and the relational knowledge among words (or their senses) contained in dictionaries.

Cannot find the paper you are looking for? You can Submit a new open access paper.