no code implementations • 19 Mar 2024 • Xudong Guo, Kaixuan Huang, Jiale Liu, Wenhui Fan, Natalia Vélez, Qingyun Wu, Huazheng Wang, Thomas L. Griffiths, Mengdi Wang
Large Language Models (LLMs) have emerged as integral tools for reasoning, planning, and decision-making, drawing upon their extensive world knowledge and proficiency in language-related tasks.
no code implementations • 2 Mar 2024 • Yifan Zeng, Yiran Wu, Xiao Zhang, Huazheng Wang, Qingyun Wu
Through conducting extensive experiments on a large scale of harmful and safe prompts, we validate the effectiveness of the proposed AutoDefense in improving the robustness against jailbreak attacks, while maintaining the performance at normal user request.
no code implementations • 21 Feb 2024 • Zhiwei Wang, Huazheng Wang, Hongning Wang
Our analysis shows that against two popularly employed MAB algorithms, UCB1 and $\epsilon$-greedy, the success of a stealthy attack depends on the environmental conditions and the realized reward of the arm pulled in the first round.
no code implementations • 8 Jan 2024 • Jiahao Qiu, Hui Yuan, Jinghong Zhang, Wentao Chen, Huazheng Wang, Mengdi Wang
To enhance the efficiency of such a process, we propose a tree search-based bandit learning method, which expands a tree starting from the initial sequence with the guidance of a bandit machine learning model.
no code implementations • 17 Oct 2023 • Zichen Wang, Chuanhao Li, Chenyu Song, Lianghui Wang, Quanquan Gu, Huazheng Wang
We study the federated pure exploration problem of multi-armed bandits and linear bandits, where $M$ agents cooperatively identify the best arm via communicating with the central server.
no code implementations • 8 Oct 2023 • Rishab Balasubramanian, Jiawei Li, Prasad Tadepalli, Huazheng Wang, Qingyun Wu, Haoyu Zhao
Contrary to prior understanding of multi-armed bandits, our work reveals a surprising fact that the attackability of a specific CMAB instance also depends on whether the bandit instance is known or unknown to the adversary.
no code implementations • 3 Aug 2023 • Souradip Chakraborty, Amrit Singh Bedi, Alec Koppel, Dinesh Manocha, Huazheng Wang, Mengdi Wang, Furong Huang
We present a novel unified bilevel optimization-based framework, \textsf{PARL}, formulated to address the recently highlighted critical issue of policy alignment in reinforcement learning using utility or preference-based feedback.
no code implementations • 26 Jul 2023 • Tanapol Kosolwattana, Huazheng Wang, Ying Lin
Adaptive monitoring of a large population of dynamic processes is critical for the timely detection of abnormal events under limited resources in many healthcare and engineering systems.
1 code implementation • 26 Jul 2023 • Huazheng Wang, Daixuan Cheng, Haifeng Sun, Jingyu Wang, Qi Qi, Jianxin Liao, Jing Wang, Cong Liu
It shows that finetuning PLMs with diffusion degrades the reconstruction ability on OOD data.
no code implementations • 24 Jul 2023 • Xiang Ji, Huazheng Wang, Minshuo Chen, Tuo Zhao, Mengdi Wang
A popular approach is to utilize human feedback to learn a reward function for training.
no code implementations • 21 Jun 2023 • Jiacheng Guo, Zihao Li, Huazheng Wang, Mengdi Wang, Zhuoran Yang, Xuezhou Zhang
In this paper, we study representation learning in partially observable Markov Decision Processes (POMDPs), where the agent learns a decoder function that maps a series of high-dimensional raw observations to a compact representation and uses it for more efficient exploration and planning.
2 code implementations • NeurIPS 2023 • Zeyu Zhang, Yi Su, Hui Yuan, Yiran Wu, Rishab Balasubramanian, Qingyun Wu, Huazheng Wang, Mengdi Wang
Building upon this, we leverage offline RL techniques for off-policy LTR and propose the Click Model-Agnostic Unified Off-policy Learning to Rank (CUOLR) method, which could be easily applied to a wide range of click models.
no code implementations • 30 May 2023 • Zichen Wang, Rishab Balasubramanian, Hui Yuan, Chenyu Song, Mengdi Wang, Huazheng Wang
We propose the first study of adversarial attacks on online learning to rank.
no code implementations • 8 Feb 2023 • Yingzhou Lu, Minjie Shen, Huazheng Wang, Xiao Wang, Capucine van Rechem, Wenqi Wei
In light of these challenges, the concept of synthetic data generation emerges as a promising alternative that allows for data sharing and utilization in ways that real-world data cannot facilitate.
no code implementations • 30 Aug 2022 • Huazheng Wang, David Zhao, Hongning Wang
We provide a rigorous theoretical analysis over the amount of noise added via dynamic global sensitivity and the corresponding upper regret bound of our proposed algorithm.
no code implementations • 29 Jun 2022 • Kaixuan Huang, Yu Wu, Xuezhou Zhang, Shenyinying Tu, Qingyun Wu, Mengdi Wang, Huazheng Wang
Online influence maximization aims to maximize the influence spread of a content in a social network with unknown network model by selecting a few seed nodes.
Model-based Reinforcement Learning reinforcement-learning +1
no code implementations • 10 Jun 2022 • Chuanhao Li, Huazheng Wang, Mengdi Wang, Hongning Wang
We tackle the communication efficiency challenge of learning kernelized contextual bandits in a distributed setting.
no code implementations • 5 Jun 2022 • Hui Yuan, Chengzhuo Ni, Huazheng Wang, Xuezhou Zhang, Le Cong, Csaba Szepesvári, Mengdi Wang
We propose a Thompson Sampling-guided Directed Evolution (TS-DE) framework for sequence optimization, where the sequence-to-function mapping is unknown and querying a single value is subject to costly and noisy measurements.
no code implementations • 18 Oct 2021 • Huazheng Wang, Haifeng Xu, Hongning Wang
We study adversarial attacks on linear stochastic bandits: by manipulating the rewards, an adversary aims to control the behaviour of the bandit algorithm.
no code implementations • 8 Apr 2021 • Huazheng Wang, Haifeng Xu, Chuanhao Li, Zhiyuan Liu, Hongning Wang
We study the problem of incentivizing exploration for myopic users in linear bandits, where the users tend to exploit arm with the highest predicted reward instead of exploring.
1 code implementation • 28 Feb 2021 • Yiling Jia, Huazheng Wang, Stephen Guo, Hongning Wang
Online Learning to Rank (OL2R) eliminates the need of explicit relevance annotation by directly optimizing the rankers from their interactions with users.
no code implementations • 16 Jul 2020 • Zhiyuan Liu, Huazheng Wang, Bo Waggoner, Youjian, Liu, Lijun Chen
We investigate the sparse linear contextual bandit problem where the parameter $\theta$ is sparse.
no code implementations • 28 Apr 2020 • Qingyao Ai, Tao Yang, Huazheng Wang, Jiaxin Mao
While their definitions of \textit{unbiasness} are different, these two types of ULTR algorithms share the same goal -- to find the best models that rank documents based on their intrinsic relevance or utility.
no code implementations • 12 Nov 2019 • Zhiyuan Liu, Huazheng Wang, Fan Shen, Kai Liu, Lijun Chen
We study incentivized exploration for the multi-armed bandit (MAB) problem where the players receive compensation for exploring arms other than the greedy choice and may provide biased feedback on reward.
no code implementations • IJCNLP 2019 • Huazheng Wang, Zhe Gan, Xiaodong Liu, Jingjing Liu, Jianfeng Gao, Hongning Wang
In this paper, we focus on unsupervised domain adaptation for Machine Reading Comprehension (MRC), where the source domain has a large amount of labeled data, while only unlabeled passages are available in the target domain.
no code implementations • 10 Jun 2019 • Huazheng Wang, Sonwoo Kim, Eric McCord-Snook, Qingyun Wu, Hongning Wang
We prove that the projected gradient is an unbiased estimation of the true gradient, and show that this lower-variance gradient estimation results in significant regret reduction.
1 code implementation • 9 Jun 2019 • Qingyun Wu, Zhige Li, Huazheng Wang, Wei Chen, Hongning Wang
We capitalize on an important property of the influence maximization problem named network assortativity, which is ignored by most existing works in online influence maximization.
no code implementations • 18 May 2018 • Huazheng Wang, Ramsey Langley, Sonwoo Kim, Eric McCord-Snook, Hongning Wang
In this paper, we accelerate the online learning process by efficient exploration in the gradient space.
no code implementations • 29 May 2015 • Huazheng Wang, Fei Tian, Bin Gao, Jiang Bian, Tie-Yan Liu
Second, we obtain distributed representations of words and relations by leveraging a novel word embedding method that considers the multi-sense nature of words and the relational knowledge among words (or their senses) contained in dictionaries.