no code implementations • ICML 2020 • Youzhi Zhang, Bo An
Second, we design an ISG variant for TMEs (ISGT) by exploiting that a TME is an NE maximizing the team’s utility and show that ISGT converges to a TME and the impossibility of relaxing conditions in ISGT.
1 code implementation • 28 May 2025 • Jujie He, Jiacai Liu, Chris Yuhao Liu, Rui Yan, Chaojie Wang, Peng Cheng, XiaoYu Zhang, Fuxiang Zhang, Jiacheng Xu, Wei Shen, Siyuan Li, Liang Zeng, Tianwen Wei, Cheng Cheng, Bo An, Yang Liu, Yahui Zhou
The success of DeepSeek-R1 underscores the significant role of reinforcement learning (RL) in enhancing the reasoning capabilities of large language models (LLMs).
1 code implementation • 22 May 2025 • Rui Ye, Keduan Huang, Qimin Wu, Yuzhu Cai, Tian Jin, Xianghe Pang, Xiangrui Liu, Jiaqi Su, Chen Qian, Bohan Tang, Kaiqu Liang, Jiaao Chen, Yue Hu, Zhenfei Yin, Rongye Shi, Bo An, Yang Gao, Wenjun Wu, Lei Bai, Siheng Chen
To address these challenges, we introduce MASLab, a unified, comprehensive, and research-friendly codebase for LLM-based MAS.
no code implementations • 18 May 2025 • Kun Huang, Weikai Xu, Yuxuan Liu, Quandong Wang, Pengzhi Gao, Wei Liu, Jian Luan, Bin Wang, Bo An
The Chain of Action-Planning Thoughts (CoaT) paradigm has been shown to improve the reasoning performance of VLM-based mobile agents in GUI tasks.
no code implementations • 17 May 2025 • Weikai Xu, Zhizheng Jiang, Yuxuan Liu, Wei Liu, Jian Luan, Yuanchun Li, Yunxin Liu, Bin Wang, Bo An
Additionally, both types of benchmarks fail to assess whether mobile agents can handle noise or engage in proactive interactions due to a lack of noisy apps or overly full instructions during the evaluation process.
1 code implementation • 16 May 2025 • Lang Feng, Zhenghai Xue, Tingcong Liu, Bo An
In this work, we propose Group-in-Group Policy Optimization (GiGPO), a novel RL algorithm that achieves fine-grained credit assignment for LLM agents while preserving the appealing properties of group-based RL: critic-free, low memory, and stable convergence.
no code implementations • 14 May 2025 • Yuzhou Cao, Han Bao, Lei Feng, Bo An
While convex smooth surrogate losses are appealing in particular due to the efficient estimation and optimization, the existence of a trade-off between the smoothness and linear regret bound has been believed in the community.
no code implementations • 30 Apr 2025 • Qirui Mi, Mengyue Yang, Xiangning Yu, Zhiyu Zhao, Cheng Deng, Bo An, Haifeng Zhang, Xu Chen, Jun Wang
To improve alignment with real-world data, we introduce IB-Tune, a novel fine-tuning method inspired by the Information Bottleneck principle, which retains population signals most predictive of future actions while filtering redundant history.
no code implementations • 22 Apr 2025 • Zhiyuan Hu, Shiyun Xiong, Yifan Zhang, See-Kiong Ng, Anh Tuan Luu, Bo An, Shuicheng Yan, Bryan Hooi
Recent advancements in visual language models (VLMs) have notably enhanced their capabilities in handling complex Graphical User Interface (GUI) interaction tasks.
no code implementations • 22 Apr 2025 • Kun Wang, Guibin Zhang, Zhenhong Zhou, Jiahao Wu, Miao Yu, Shiqian Zhao, Chenlong Yin, Jinhu Fu, Yibo Yan, Hanjun Luo, Liang Lin, Zhihao Xu, Haolang Lu, Xinye Cao, Xinyun Zhou, Weifei Jin, Fanci Meng, Shicheng Xu, Junyuan Mao, Yu Wang, Hao Wu, Minghe Wang, Fan Zhang, Junfeng Fang, Wenjie Qu, Yue Liu, Chengwei Liu, Yifan Zhang, Qiankun Li, Chongye Guo, Yalan Qin, Zhaoxin Fan, Kai Wang, Yi Ding, Donghai Hong, Jiaming Ji, Yingxin Lai, Zitong Yu, Xinfeng Li, Yifan Jiang, Yanhui Li, Xinyu Deng, Junlin Wu, Dongxia Wang, Yihao Huang, Yufei Guo, Jen-tse Huang, Qiufeng Wang, Xiaolong Jin, Wenxuan Wang, Dongrui Liu, Yanwei Yue, Wenke Huang, Guancheng Wan, Heng Chang, Tianlin Li, Yi Yu, Chenghao Li, Jiawei Li, Lei Bai, Jie Zhang, Qing Guo, Jingyi Wang, Tianlong Chen, Joey Tianyi Zhou, Xiaojun Jia, Weisong Sun, Cong Wu, Jing Chen, Xuming Hu, Yiming Li, Xiao Wang, Ningyu Zhang, Luu Anh Tuan, Guowen Xu, Jiaheng Zhang, Tianwei Zhang, Xingjun Ma, Jindong Gu, Liang Pang, Xiang Wang, Bo An, Jun Sun, Mohit Bansal, Shirui Pan, Lingjuan Lyu, Yuval Elovici, Bhavya Kailkhura, Yaodong Yang, Hongwei Li, Wenyuan Xu, Yizhou Sun, Wei Wang, Qing Li, Ke Tang, Yu-Gang Jiang, Felix Juefei-Xu, Hui Xiong, XiaoFeng Wang, DaCheng Tao, Philip S. Yu, Qingsong Wen, Yang Liu
Currently, existing surveys on LLM safety primarily focus on specific stages of the LLM lifecycle, e. g., deployment phase or fine-tuning phase, lacking a comprehensive understanding of the entire "lifechain" of LLMs.
1 code implementation • 20 Apr 2025 • Jingtong Gao, Yewen Li, Shuai Mao, Nan Jiang, Yejing Wang, Qingpeng Cai, Fei Pan, Peng Jiang, Kun Gai, Bo An, Xiangyu Zhao
Auto-bidding, with its strong capability to optimize bidding decisions within dynamic and competitive online environments, has become a pivotal strategy for advertising platforms.
1 code implementation • 8 Apr 2025 • Haoyu Wang, Yujia Fu, Zhu Zhang, Shuo Wang, Zirui Ren, Xiaorong Wang, Zhili Li, Chaoqun He, Bo An, Zhiyuan Liu, Maosong Sun
Long-form generation is crucial for a wide range of practical applications, typically categorized into short-to-long and long-to-long generation.
no code implementations • 13 Mar 2025 • Zhe Zhao, Haibin Wen, Pengkun Wang, Ye Wei, Zaixi Zhang, Xi Lin, Fei Liu, Bo An, Hui Xiong, Yang Wang, Qingfu Zhang
Large language models (LLMs) have greatly accelerated the automation of algorithm generation and optimization.
no code implementations • 10 Mar 2025 • Zhenghai Xue, Lang Feng, Jiacheng Xu, Kang Kang, Xiang Wen, Bo An, Shuicheng Yan
Additionally, as the environment dynamics change, certain expert states may become inaccessible, rendering their distributions less valuable for imitation.
1 code implementation • 24 Feb 2025 • Penghui Yang, Cunxiao Du, Fengzhuo Zhang, Haonan Wang, Tianyu Pang, Chao Du, Bo An
Despite its promise, the effective application of speculative decoding in LLMs still confronts three key challenges: the increasing memory demands of the draft model, the distribution shift between the short-training corpora and long-context inference, and inefficiencies in attention implementation.
no code implementations • 29 Jan 2025 • Shuxin Zhuang, Shuxin Li, Tianji Yang, Muheng Li, Xianjie Shi, Bo An, Youzhi Zhang
To facilitate the development of designing efficient learning algorithms for solving multiplayer games, we propose a multiplayer game platform for solving Urban Network Security Games (\textbf{UNSG}) that model real-world scenarios.
no code implementations • CVPR 2025 • Benquan Wang, Ruyi An, Jin-Kyu So, Sergei Kurdiumov, Eng Aik Chan, Giorgio Adamo, Yuhan Peng, Yewen Li, Bo An
Experimental results validate our "building block" concept, demonstrating that models trained on basic square units can effectively generalize to realistic, more complex unseen objects.
no code implementations • 22 Dec 2024 • Yewen Li, Shuai Mao, Jingtong Gao, Nan Jiang, Yunjian Xu, Qingpeng Cai, Fei Pan, Peng Jiang, Bo An
We use weak-to-strong search alignment by training small critics for different preferences and an MCTS-inspired search to refine the model's output.
no code implementations • 5 Dec 2024 • Longtao Zheng, Yifan Zhang, Hanzhong Guo, Jiachun Pan, Zhenxiong Tan, Jiahao Lu, Chuanxin Tang, Bo An, Shuicheng Yan
Recent advances in video diffusion models have unlocked new potential for realistic audio-driven talking video generation.
no code implementations • 28 Nov 2024 • Xiaoxuan Lou, Chaojie Wang, Bo An
Mathematical reasoning is a fundamental capability for large language models (LLMs), yet achieving high performance in this domain remains a significant challenge.
no code implementations • 13 Nov 2024 • Penghui Yang, Chen-Chen Zong, Sheng-Jun Huang, Lei Feng, Bo An
Drawing from the theoretical analysis, we propose a novel method called dual-head knowledge distillation, which partitions the linear classifier into two classification heads responsible for different losses, thereby preserving the beneficial effects of both losses on the backbone while eliminating adverse influences on the classification head.
no code implementations • 27 Oct 2024 • HaiTao Zhang, Bo An
Large models are a hot research topic in the field of artificial intelligence.
no code implementations • 7 Oct 2024 • Aye Phyu Phyu Aung, Xinrun Wang, Ruiyu Wang, Hau Chan, Bo An, XiaoLi Li, J. Senthilnath
In this paper, we propose a new approach to train deep learning models using game theory concepts including Generative Adversarial Networks (GANs) and Adversarial Training (AT) where we deploy a double-oracle framework using best response oracles.
no code implementations • 2 Oct 2024 • Naming Liu, Mingzhi Wang, Xihuai Wang, Weinan Zhang, Yaodong Yang, Youzhi Zhang, Bo An, Ying Wen
Such insufficient policy expressiveness causes Team PSRO to be trapped into a sub-optimal ex ante equilibrium with significantly higher exploitability and never converges to the global ex ante equilibrium.
1 code implementation • 16 Sep 2024 • Hezhe Qiao, Hanghang Tong, Bo An, Irwin King, Charu Aggarwal, Guansong Pang
To this end, in this work we aim to present a comprehensive review of deep learning approaches for GAD.
no code implementations • 5 Sep 2024 • Yewen Li, Chaojie Wang, Xiaobo Xia, Xu He, Ruyi An, Dong Li, Tongliang Liu, Bo An, Xinrun Wang
Therefore, we appeal for more attention to incremental effectiveness on likelihood, i. e., whether a method could always surpass or at least match the performance of likelihood in U-OOD detection.
no code implementations • 10 Aug 2024 • Shuxin Li, Chang Yang, Youzhi Zhang, Pengdeng Li, Xinrun Wang, Xiao Huang, Hau Chan, Bo An
Nash equilibrium (NE) is a widely adopted solution concept in game theory due to its stability property.
1 code implementation • 4 Jul 2024 • Yi-Chen Li, Fuxiang Zhang, Wenjie Qiu, Lei Yuan, Chengxing Jia, Zongzhang Zhang, Yang Yu, Bo An
Thanks to the residual Q-learning framework, we can restore the customized LLM with the pre-trained LLM and the \emph{residual Q-function} without the reward function $r_1$.
no code implementations • 20 Jun 2024 • Chaojie Wang, Yanchen Deng, Zhiyi Lyu, Liang Zeng, Jujie He, Shuicheng Yan, Bo An
Large Language Models (LLMs) have demonstrated impressive capability in many natural language tasks.
1 code implementation • 20 Jun 2024 • Chuqiao Zong, Chaojie Wang, Molei Qin, Lei Feng, Xinrun Wang, Bo An
To tackle these problems, we propose a novel Memory Augmented Context-aware Reinforcement learning method On HFT, \emph{a. k. a.}
no code implementations • 3 Jun 2024 • Zitao Song, Chao Yang, Chaojie Wang, Bo An, Shuang Li
In the E-step, we evaluate the posterior distribution over the latent logic trees using an LLM prior and the likelihood of the observed event sequences.
1 code implementation • 29 May 2024 • Renchunzi Xie, Ambroise Odonnat, Vasilii Feofanov, Weijian Deng, Jianfeng Zhang, Bo An
Our findings motivate our proposed method MaNo which (1) applies a data-dependent normalization on the logits to reduce prediction bias, and (2) takes the $L_p$ norm of the matrix of normalized logits as the estimation score.
1 code implementation • 28 May 2024 • Lang Feng, Pengjie Gu, Bo An, Gang Pan
As the structure evolves with the integration of new trajectories, unreliable states are marginalized, and the most impactful nodes are prioritized for decision-making.
1 code implementation • 20 May 2024 • Pengdeng Li, Shuxin Li, Chang Yang, Xinrun Wang, Shuyue Hu, Xiao Huang, Hau Chan, Bo An
Decision-making problems, categorized as single-agent, e. g., Atari, cooperative multi-agent, e. g., Hanabi, competitive multi-agent, e. g., Hold'em poker, and mixed cooperative and competitive, e. g., football, are ubiquitous in the real world.
no code implementations • 14 May 2024 • Yiwen Zhu, Jinyi Liu, Wenya Wei, Qianyi Fu, Yujing Hu, Zhou Fang, Bo An, Jianye Hao, Tangjie Lv, Changjie Fan
Enhancing learning efficiency remains a key challenge in RL, with many efforts focused on using ensemble critics to boost policy evaluation efficiency.
1 code implementation • 2 May 2024 • Safa Messaoud, Billel Mokeddem, Zhenghai Xue, Linsey Pang, Bo An, Haipeng Chen, Sanjay Chawla
We derive a closed-form expression of the entropy of such policies.
1 code implementation • 19 Apr 2024 • Pengdeng Li, Shuxin Li, Xinrun Wang, Jakub Cerny, Youzhi Zhang, Stephen Mcaleer, Hau Chan, Bo An
Pursuit-evasion games (PEGs) model interactions between a team of pursuers and an evader in graph-based environments such as urban street networks.
no code implementations • 17 Apr 2024 • Pengdeng Li, Shuxin Li, Chang Yang, Xinrun Wang, Xiao Huang, Hau Chan, Bo An
(2) We propose the self-adaptive PSRO (SPSRO) by casting the hyperparameter value selection of the parametric PSRO as a hyperparameter optimization (HPO) problem where our objective is to learn an HPO policy that can self-adaptively determine the optimal hyperparameter values during the running of the parametric PSRO.
1 code implementation • 26 Mar 2024 • Longtao Zheng, Zhiyuan Huang, Zhenghai Xue, Xinrun Wang, Bo An, Shuicheng Yan
General virtual agents need to handle multimodal observations, master complex action spaces, and self-improve in dynamic, open-domain environments.
1 code implementation • 5 Mar 2024 • Weihao Tan, Wentao Zhang, Xinrun Xu, Haochong Xia, Ziluo Ding, Boyu Li, Bohan Zhou, Junpeng Yue, Jiechuan Jiang, Yewen Li, Ruyi An, Molei Qin, Chuqiao Zong, Longtao Zheng, Yujie Wu, Xiaoqiang Chai, Yifei Bi, Tianbao Xie, Pengjie Gu, Xiyun Li, Ceyao Zhang, Long Tian, Chaojie Wang, Xinrun Wang, Börje F. Karlsson, Bo An, Shuicheng Yan, Zongqing Lu
To handle this issue, we propose the General Computer Control (GCC) setting to restrict foundation agents to interact with software through the most unified and standardized interface, i. e., using screenshots as input and keyboard and mouse actions as output.
no code implementations • 28 Feb 2024 • Wentao Zhang, Lingxuan Zhao, Haochong Xia, Shuo Sun, Jiaze Sun, Molei Qin, Xinyi Li, Yuqing Zhao, Yilei Zhao, Xinyu Cai, Longtao Zheng, Xinrun Wang, Bo An
Notably, FinAgent is the first advanced multimodal foundation agent designed for financial trading tasks.
1 code implementation • 25 Jan 2024 • Weihao Tan, Wentao Zhang, Shanqi Liu, Longtao Zheng, Xinrun Wang, Bo An
Despite the impressive performance across numerous tasks, large language models (LLMs) often fail in solving simple decision-making tasks due to the misalignment of the knowledge in LLMs with environments.
1 code implementation • 24 Jan 2024 • Qi Wei, Lei Feng, Haobo Wang, Bo An
To address this limitation, we propose a noIse-Tolerant Expert Model (ITEM) for debiased learning in sample selection.
1 code implementation • 17 Jan 2024 • Renchunzi Xie, Ambroise Odonnat, Vasilii Feofanov, Ievgen Redko, Jianfeng Zhang, Bo An
Estimating the test performance of a model, possibly under distribution shift, without having access to the ground-truth labels is a challenging, yet very important problem for the safe deployment of machine learning algorithms in the wild.
1 code implementation • CVPR 2024 • Ruyi An, Yewen Li, Xu He, Pengjie Gu, Mengchen Zhao, Dong Li, Jianye Hao, Chaojie Wang, Bo An, Mingyuan Zhou
To address this issue we first analyze the shortcomings of existing methods for mitigating the "posterior collapse" from an information theory perspective then highlight the necessity of regularization for explicitly propagating data information to higher-level latent variables while maintaining the dependency between different levels.
no code implementations • 31 Dec 2023 • Chaojie Wang, Yishi Xu, Zhong Peng, Chenxi Zhang, Bo Chen, Xinrun Wang, Lei Feng, Bo An
Large language models (LLMs) have exhibited remarkable performance on various natural language processing (NLP) tasks, especially for question answering.
1 code implementation • 17 Nov 2023 • Wentao Zhang, Yilei Zhao, Shuo Sun, Jie Ying, Yonggang Xie, Zitao Song, Xinrun Wang, Bo An
Specifically, the target stock pool of different investors varies dramatically due to their discrepancy on market states and individual investors may temporally adjust stocks they desire to trade (e. g., adding one popular stocks), which lead to customizable stock pools (CSPs).
no code implementations • 6 Oct 2023 • Zhenghai Xue, Qingpeng Cai, Bin Yang, Lantao Hu, Peng Jiang, Kun Gai, Bo An
As the policy performance of RL is sensitive to environment drifts, the loss function enables the state abstraction to be reflective of environment changes and notify the recommendation policy to adapt accordingly.
1 code implementation • 22 Sep 2023 • Molei Qin, Shuo Sun, Wentao Zhang, Haochong Xia, Xinrun Wang, Bo An
In stage II, we construct a pool of diverse RL agents for different market trends, distinguished by return rates, where hundreds of RL agents are trained with different preferences of return rates and only a tiny fraction of them will be selected into the pool based on their profitability.
1 code implementation • 14 Sep 2023 • Haochong Xia, Shuo Sun, Xinrun Wang, Bo An
Financial simulators play an important role in enhancing forecasting accuracy, managing risks, and fostering strategic financial decision-making.
no code implementations • 22 Aug 2023 • Linjian Meng, Youzhi Zhang, Zhenxing Ge, Shangdong Yang, Tianyu Ding, Wenbin Li, Tianpei Yang, Bo An, Yang Gao
To establish last-iterate convergence for Counterfactual Regret Minimization (CFR) algorithms in learning a Nash equilibrium (NE) of extensive-form games (EFGs), recent studies reformulate learning an NE of the original EFG as learning the NEs of a sequence of (perturbed) regularized EFGs.
no code implementations • 17 Aug 2023 • Hui Niu, Siyuan Li, Jiahao Zheng, Zhouchi Lin, Jian Li, Jian Guo, Bo An
Market making (MM) has attracted significant attention in financial trading owing to its essential function in ensuring market liquidity.
no code implementations • 18 Jun 2023 • Xin Cheng, Yuzhou Cao, Ximing Li, Bo An, Lei Feng
Third, we propose a statistically consistent limiting method for RIT to train the model by limiting the predictions to the interval.
1 code implementation • AAAI 2023 • Xin Cheng, Deng-Bao Wang, Lei Feng, Min-Ling Zhang, Bo An
Our proposed methods are theoretically grounded and can be compatible with any models, optimizers, and losses.
1 code implementation • 13 Jun 2023 • Longtao Zheng, Rundong Wang, Xinrun Wang, Bo An
To address these challenges, we introduce Synapse, a computer agent featuring three key components: i) state abstraction, which filters out task-irrelevant information from raw states, allowing more exemplars within the limited context, ii) trajectory-as-exemplar prompting, which prompts the LLM with complete trajectories of the abstracted states and actions to improve multi-step decision-making, and iii) exemplar memory, which stores the embeddings of exemplars and retrieves them via similarity search for generalization to novel tasks.
no code implementations • NeurIPS 2023 • Zhenghai Xue, Qingpeng Cai, Shuchang Liu, Dong Zheng, Peng Jiang, Kun Gai, Bo An
Data with dynamics shift are separated according to their environment parameters to train the corresponding policy.
no code implementations • 7 Feb 2023 • Rundong Wang, Longtao Zheng, Wei Qiu, Bowei He, Bo An, Zinovi Rabinovich, Yujing Hu, Yingfeng Chen, Tangjie Lv, Changjie Fan
Despite its success, ACL's applicability is limited by (1) the lack of a general student framework for dealing with the varying number of agents across tasks and the sparse reward problem, and (2) the non-stationarity of the teacher's task due to ever-changing student strategies.
Multi-agent Reinforcement Learning
reinforcement-learning
+2
no code implementations • 7 Feb 2023 • Pengdeng Li, Xinrun Wang, Shuxin Li, Hau Chan, Bo An
In this work, we attempt to bridge the two fields of finite-agent and infinite-agent games, by studying how the optimal policies of agents evolve with the number of agents (population size) in mean-field games, an agent-centric perspective in contrast to the existing works focusing typically on the convergence of the empirical distribution of the population.
1 code implementation • 7 Feb 2023 • Simin Li, Jun Guo, Jingqiao Xiu, Yuwei Zheng, Pu Feng, Xin Yu, Aishan Liu, Yaodong Yang, Bo An, Wenjun Wu, Xianglong Liu
To achieve maximum deviation in victim policies under complex agent-wise interactions, our unilateral attack aims to characterize and maximize the impact of the adversary on the victims.
no code implementations • 27 Jan 2023 • Wanqi Xue, Bo An, Shuicheng Yan, Zhongwen Xu
The complexity of designing reward functions has been a major obstacle to the wide application of deep reinforcement learning (RL) techniques.
no code implementations • 14 Jan 2023 • Shuo Sun, Molei Qin, Xinrun Wang, Bo An
Specifically, i) we propose AlphaMix+ as a strong FinRL baseline, which leverages mixture-of-experts (MoE) and risk-sensitive approaches to make diversified risk-aware investment decisions, ii) we evaluate 8 FinRL methods in 4 long-term real-world datasets of influential financial markets to demonstrate the usage of our PRUDEX-Compass, iii) PRUDEX-Compass together with 4 real-world datasets, standard implementation of 8 FinRL methods and a portfolio management environment is released as public resources to facilitate the design and comparison of new FinRL methods.
no code implementations • 8 Dec 2022 • Hongxin Wei, Huiping Zhuang, Renchunzi Xie, Lei Feng, Gang Niu, Bo An, Yixuan Li
In the presence of noisy labels, designing robust loss functions is critical for securing the generalization performance of deep neural networks.
1 code implementation • 6 Dec 2022 • Wanqi Xue, Qingpeng Cai, Zhenghai Xue, Shuo Sun, Shuchang Liu, Dong Zheng, Peng Jiang, Kun Gai, Bo An
Though promising, the application of RL heavily relies on well-designed rewards, but designing rewards related to long-term user engagement is quite difficult.
2 code implementations • Conference 2022 • Yuzhou Cao, Tianchi Cai, Lei Feng, Lihong Gu, Jinjie Gu, Bo An, Gang Niu, Masashi Sugiyama
\emph{Classification with rejection} (CwR) refrains from making a prediction to avoid critical misclassification when encountering test samples that are difficult to classify.
1 code implementation • 24 Oct 2022 • Shijie Han, Siyuan Li, Bo An, Wei Zhao, Peng Liu
In this work, we develop a novel identity detection reinforcement learning (IDRL) framework that allows an agent to dynamically infer the identities of nearby agents and select an appropriate policy to accomplish the task.
Multi-agent Reinforcement Learning
reinforcement-learning
+2
no code implementations • 18 Oct 2022 • Wei Qiu, Xiao Ma, Bo An, Svetlana Obraztsova, Shuicheng Yan, Zhongwen Xu
Despite the recent advancement in multi-agent reinforcement learning (MARL), the MARL agents easily overfit the training environment and perform poorly in the evaluation scenarios where other agents behave differently.
Multi-agent Reinforcement Learning
reinforcement-learning
+2
no code implementations • 24 Sep 2022 • Yanchen Deng, Shufeng Kong, Caihua Liu, Bo An
Belief Propagation (BP) is an important message-passing algorithm for various reasoning tasks over graphical models, including solving the Constraint Optimization Problems (COPs).
1 code implementation • 12 Jul 2022 • Shuxin Li, Xinrun Wang, Youzhi Zhang, Jakub Cerny, Pengdeng Li, Hau Chan, Bo An
Extensive experimental results demonstrate the superiority of our approach over offline RL algorithms and the importance of using model-based methods for OEF problems.
3 code implementations • 17 Jun 2022 • Hongxin Wei, Lue Tao, Renchunzi Xie, Lei Feng, Bo An
Deep neural networks usually perform poorly when the training dataset suffers from extreme class imbalance.
no code implementations • 7 Jun 2022 • Shuo Sun, Rundong Wang, Bo An
To tackle these two limitations, we first reformulate quantitative investment as a multi-task learning problem.
1 code implementation • 1 Jun 2022 • Wanqi Xue, Qingpeng Cai, Ruohan Zhan, Dong Zheng, Peng Jiang, Kun Gai, Bo An
Meanwhile, reinforcement learning (RL) is widely regarded as a promising framework for optimizing long-term engagement in sequential recommendation.
no code implementations • 27 May 2022 • Wei Qiu, Weixun Wang, Rundong Wang, Bo An, Yujing Hu, Svetlana Obraztsova, Zinovi Rabinovich, Jianye Hao, Yingfeng Chen, Changjie Fan
During execution durations, the environment changes are influenced by, but not synchronised with, action execution.
Multi-agent Reinforcement Learning
reinforcement-learning
+4
2 code implementations • 19 May 2022 • Hongxin Wei, Renchunzi Xie, Hao Cheng, Lei Feng, Bo An, Yixuan Li
Our method is motivated by the analysis that the norm of the logit keeps increasing during training, leading to overconfident output.
no code implementations • 17 Jan 2022 • Wanqi Xue, Bo An, Chai Kiat Yeo
Second, we enable neural MCTS with decentralized control, making NSGZero applicable to NSGs with many resources.
3 code implementations • 16 Jan 2022 • Renchunzi Xie, Hongxin Wei, Lei Feng, Bo An
Although there have been a few studies on this problem, most of them only exploit unidirectional relationships from the source domain to the target domain.
no code implementations • 15 Dec 2021 • Shuo Sun, Wanqi Xue, Rundong Wang, Xu He, Junlei Zhu, Jian Li, Bo An
Reinforcement learning (RL) techniques have shown great success in many challenging quantitative trading tasks, such as portfolio management and algorithmic trading.
1 code implementation • 8 Dec 2021 • Yanchen Deng, Shufeng Kong, Bo An
Our model, GAT-PCM, is then pretrained with optimally labelled data in an offline manner, so as to construct effective heuristics to boost a broad range of DCOP algorithms where evaluating the quality of a partial assignment is critical, such as local search or backtracking search.
no code implementations • NeurIPS 2021 • Wei Qiu, Xinrun Wang, Runsheng Yu, Rundong Wang, Xu He, Bo An, Svetlana Obraztsova, Zinovi Rabinovich
Current value-based multi-agent reinforcement learning methods optimize individual Q values to guide individuals' behaviours via centralized training with decentralized execution (CTDE).
no code implementations • 29 Sep 2021 • Hongxin Wei, Lue Tao, Renchunzi Xie, Lei Feng, Bo An
Deep neural networks usually perform poorly when the training dataset suffers from extreme class imbalance.
no code implementations • 29 Sep 2021 • Pengjie Gu, Mengchen Zhao, Chen Chen, Dong Li, Jianye Hao, Bo An
Offline reinforcement learning is a promising approach for practical applications since it does not require interactions with real-world environments.
no code implementations • ICLR 2022 • Pengjie Gu, Mengchen Zhao, Jianye Hao, Bo An
Autonomous agents often need to work together as a team to accomplish complex cooperative tasks.
no code implementations • 28 Sep 2021 • Shuo Sun, Rundong Wang, Bo An
RL's impact is pervasive, recently demonstrating its ability to conquer many challenging QT tasks.
no code implementations • 9 Aug 2021 • Wanqi Xue, Wei Qiu, Bo An, Zinovi Rabinovich, Svetlana Obraztsova, Chai Kiat Yeo
Empirical results demonstrate that many state-of-the-art MACRL methods are vulnerable to message attacks, and our method can significantly improve their robustness.
Multi-agent Reinforcement Learning
reinforcement-learning
+2
4 code implementations • NeurIPS 2021 • Hongxin Wei, Lue Tao, Renchunzi Xie, Bo An
Learning with noisy labels is a practically challenging problem in weakly supervised learning.
no code implementations • 16 Jun 2021 • Yuzhou Cao, Lei Feng, Senlin Shu, Yitian Xu, Bo An, Gang Niu, Masashi Sugiyama
We show that without any assumptions on the loss functions, models, and optimizers, we can successfully learn a multi-class classifier from only data of a single class with a rigorous consistency guarantee when confidences (i. e., the class-posterior probabilities for all the classes) are available.
1 code implementation • 13 Jun 2021 • Haipeng Chen, Wei Qiu, Han-Ching Ou, Bo An, Milind Tambe
Empirical results show that our method achieves influence as high as the state-of-the-art methods for contingency-aware IM, while having negligible runtime at test phase.
no code implementations • 11 Jun 2021 • Jiaqi Lv, Biao Liu, Lei Feng, Ning Xu, Miao Xu, Bo An, Gang Niu, Xin Geng, Masashi Sugiyama
Partial-label learning (PLL) utilizes instances with PLs, where a PL includes several candidate labels but only one is the true label (TL).
no code implementations • 2 Jun 2021 • Wanqi Xue, Youzhi Zhang, Shuxin Li, Xinrun Wang, Bo An, Chai Kiat Yeo
Securing networked infrastructures is important in the real world.
no code implementations • 18 May 2021 • Shuxin Li, Youzhi Zhang, Xinrun Wang, Wanqi Xue, Bo An
The challenge of solving this type of game is that the team's joint action space grows exponentially with the number of agents, which results in the inefficiency of the existing algorithms, e. g., Counterfactual Regret Minimization (CFR).
no code implementations • 18 Feb 2021 • Zhe Wu, Kai Li, Enmin Zhao, Hang Xu, Meng Zhang, Haobo Fu, Bo An, Junliang Xing
In this work, we propose a novel Learning to Exploit (L2E) framework for implicit opponent modeling.
no code implementations • CVPR 2022 • Aye Phyu Phyu Aung, Xinrun Wang, Runsheng Yu, Bo An, Senthilnath Jayavelu, XiaoLi Li
In this paper, we propose a new approach to train Generative Adversarial Networks (GANs) where we deploy a double-oracle framework using the generator and discriminator oracles.
no code implementations • 16 Feb 2021 • Wei Qiu, Xinrun Wang, Runsheng Yu, Xu He, Rundong Wang, Bo An, Svetlana Obraztsova, Zinovi Rabinovich
Current value-based multi-agent reinforcement learning methods optimize individual Q values to guide individuals' behaviours via centralized training with decentralized execution (CTDE).
no code implementations • 13 Feb 2021 • Yuzhou Cao, Lei Feng, Yitian Xu, Bo An, Gang Niu, Masashi Sugiyama
Weakly supervised learning has drawn considerable attention recently to reduce the expensive time and labor consumption of labeling massive data.
no code implementations • 8 Jan 2021 • Runsheng Yu, Yu Gong, Rundong Wang, Bo An, Qingwen Liu, Wenwu Ou
Firstly, we introduce a novel training scheme with two value functions to maximize the accumulated long-term reward under the safety constraint.
no code implementations • 1 Jan 2021 • Wei Qiu, Xinrun Wang, Runsheng Yu, Xu He, Rundong Wang, Bo An, Svetlana Obraztsova, Zinovi Rabinovich
Centralized training with decentralized execution (CTDE) has become an important paradigm in multi-agent reinforcement learning (MARL).
Multi-agent Reinforcement Learning
reinforcement-learning
+4
no code implementations • 23 Dec 2020 • Rundong Wang, Hongxin Wei, Bo An, Zhouyan Feng, Jun Yao
Portfolio management via reinforcement learning is at the forefront of fintech research, which explores how to optimally reallocate a fund into different financial assets over the long term by trial-and-error.
no code implementations • 22 Dec 2020 • Runsheng Yu, Yu Gong, Xu He, Bo An, Yu Zhu, Qingwen Liu, Wenwu Ou
Recently, many existing studies regard the cold-start personalized preference prediction as a few-shot learning problem, where each user is the task and recommended items are the classes, and the gradient-based meta learning method (MAML) is leveraged to address this challenge.
no code implementations • 9 Dec 2020 • Hongxin Wei, Lei Feng, Rundong Wang, Bo An
Deep neural networks have been shown to easily overfit to biased training data with label noise or class imbalance.
no code implementations • 7 Dec 2020 • Xinrun Wang, Tarun Nair, Haoyang Li, Yuh Sheng Reuben Wong, Nachiket Kelkar, Srinivas Vaidyanathan, Rajat Nayak, Bo An, Jagdish Krishnaswamy, Milind Tambe
Dams impact downstream river dynamics through flow regulation and disruption of upstream-downstream linkages.
no code implementations • 7 Dec 2020 • Yan Li, Bo An, Junming Ma, Donggang Cao, Yasha Wang, Hong Mei
Hyper-parameter tuning (HPT) is crucial for many machine learning (ML) algorithms.
no code implementations • 2 Dec 2020 • Zhuowei Wang, Jing Jiang, Bo Han, Lei Feng, Bo An, Gang Niu, Guodong Long
We also instantiate our framework with different combinations, which set the new state of the art on benchmark-simulated and real-world datasets with noisy labels.
1 code implementation • COLING 2020 • Rong Zhang, Qifei Zhou, Bo An, Weiping Li, Tong Mo, Bo Wu
2) There is no previous work considering adversarial attack to improve the performance of NLSM tasks.
no code implementations • 5 Oct 2020 • Lei Feng, Senlin Shu, Nan Lu, Bo Han, Miao Xu, Gang Niu, Bo An, Masashi Sugiyama
To alleviate the data requirement for training effective binary classifiers in binary classification, many weakly supervised learning settings have been proposed.
no code implementations • 30 Sep 2020 • David Milec, Jakub Černý, Viliam Lisý, Bo An
This paper aims to analyze and propose scalable algorithms for computing effective and robust strategies against a quantal opponent in normal-form and extensive-form games.
no code implementations • 21 Aug 2020 • Xu He, Bo An, Yanghua Li, Haikai Chen, Rundong Wang, Xinrun Wang, Runsheng Yu, Xin Li, Zhirong Wang
Thus, the global policy of the whole page could be sub-optimal.
Multi-agent Reinforcement Learning
Reinforcement Learning (RL)
no code implementations • 21 Aug 2020 • Xu He, Bo An, Yanghua Li, Haikai Chen, Qingyu Guo, Xin Li, Zhirong Wang
First, since we concern the reward of a set of recommended items, we model the online recommendation as a contextual combinatorial bandit problem and define the reward of a recommended set.
no code implementations • NeurIPS 2020 • Lei Feng, Jiaqi Lv, Bo Han, Miao Xu, Gang Niu, Xin Geng, Bo An, Masashi Sugiyama
Partial-label learning (PLL) is a multi-class classification problem, where each training example is associated with a set of candidate labels.
1 code implementation • 7 Jun 2020 • Xu He, Haipeng Chen, Bo An
However, previous works rarely consider the uncertainty when humans provide feedback, especially in cases that the optimal actions are not obvious to the trainers.
no code implementations • ICLR 2020 • Zhenyu Shi*, Runsheng Yu*, Xinrun Wang*, Rundong Wang, Youzhi Zhang, Hanjiang Lai, Bo An
The main difficulties of expensive coordination are that i) the leader has to consider the long-term effect and predict the followers' behaviors when assigning bonuses and ii) the complex interactions between followers make the training process hard to converge, especially when the leader's policy changes with time.
2 code implementations • CVPR 2020 • Hongxin Wei, Lei Feng, Xiangyu Chen, Bo An
The state-of-the-art approaches "Decoupling" and "Co-teaching+" claim that the "disagreement" strategy is crucial for alleviating the problem of learning with noisy labels.
Ranked #11 on
Learning with noisy labels
on CIFAR-10N-Random3
no code implementations • ICML 2020 • Lei Feng, Takuo Kaneko, Bo Han, Gang Niu, Bo An, Masashi Sugiyama
In this paper, we propose a novel problem setting to allow MCLs for each example and two ways for learning with MCLs.
no code implementations • 18 Nov 2019 • Runsheng Yu, Zhenyu Shi, Xinrun Wang, Rundong Wang, Buhong Liu, Xinwen Hou, Hanjiang Lai, Bo An
Existing value-factorized based Multi-Agent deep Reinforce-ment Learning (MARL) approaches are well-performing invarious multi-agent cooperative environment under thecen-tralized training and decentralized execution(CTDE) scheme, where all agents are trained together by the centralized valuenetwork and each agent execute its policy independently.
no code implementations • ICML 2020 • Rundong Wang, Xu He, Runsheng Yu, Wei Qiu, Bo An, Zinovi Rabinovich
Under the limited bandwidth constraint, a communication protocol is required to generate informative messages.
Multi-agent Reinforcement Learning
Reinforcement Learning
+1
no code implementations • IJCNLP 2019 • Bo An, Chen Bo, Xianpei Han, Le Sun
Semantic parsing aims to map natural language utterances into structured meaning representations.
no code implementations • NeurIPS 2019 • Jiarui Gan, Qingyu Guo, Long Tran-Thanh, Bo An, Michael Wooldridge
We then apply a game-theoretic framework at a higher level to counteract such manipulation, in which the defender commits to a policy that specifies her strategy commitment according to the learned information.
no code implementations • 3 Mar 2019 • Jiang Rong, Tao Qin, Bo An
Second, based on the analysis of the impact of other players' unknown cards on one's final rewards, we design two neural networks to deal with imperfect information, the first one inferring the cards of the partner and the second one taking the outputs of the first one as part of its input to select a bid.
no code implementations • 8 Feb 2019 • Lei Feng, Bo An
We show that optimizing this convex-concave problem is equivalent to solving a set of quadratic programming (QP) problems.
no code implementations • 8 Feb 2019 • Lei Feng, Bo An, Shuo He
It is well-known that exploiting label correlations is crucially important to multi-label learning.
no code implementations • ACL 2016 • Bo Chen, Le Sun, Xianpei Han, Bo An
A major challenge of semantic parsing is the vocabulary mismatch problem between natural language and target ontology.
no code implementations • COLING 2018 • Bo An, Xianpei Han, Le Sun
Word composition is a promising technique for representation learning of large linguistic units (e. g., phrases, sentences and documents).
no code implementations • COLING 2018 • Bo Chen, Bo An, Le Sun, Xianpei Han
Semantic parsers critically rely on accurate and high-coverage lexicons.
no code implementations • NAACL 2018 • Bo An, Bo Chen, Xianpei Han, Le Sun
Previous representation learning techniques for knowledge graph representation usually represent the same entity or relation in different triples with the same representation, without considering the ambiguity of relations and entities.
1 code implementation • 1 Apr 2017 • Yihui He, Xiaobo Ma, Xiapu Luo, Jianfeng Li, Mengchen Zhao, Bo An, Xiaohong Guan
Security surveillance is one of the most important issues in smart cities, especially in an era of terrorism.