Search Results for author: Bo An

Found 124 papers, 41 papers with code

Converging to Team-Maxmin Equilibria in Zero-Sum Multiplayer Games

no code implementations ICML 2020 Youzhi Zhang, Bo An

Second, we design an ISG variant for TMEs (ISGT) by exploiting that a TME is an NE maximizing the team’s utility and show that ISGT converges to a TME and the impossibility of relaxing conditions in ISGT.

Skywork Open Reasoner 1 Technical Report

1 code implementation28 May 2025 Jujie He, Jiacai Liu, Chris Yuhao Liu, Rui Yan, Chaojie Wang, Peng Cheng, XiaoYu Zhang, Fuxiang Zhang, Jiacheng Xu, Wei Shen, Siyuan Li, Liang Zeng, Tianwen Wei, Cheng Cheng, Bo An, Yang Liu, Yahui Zhou

The success of DeepSeek-R1 underscores the significant role of reinforcement learning (RL) in enhancing the reasoning capabilities of large language models (LLMs).

Math Reinforcement Learning (RL)

Enhance Mobile Agents Thinking Process Via Iterative Preference Learning

no code implementations18 May 2025 Kun Huang, Weikai Xu, Yuxuan Liu, Quandong Wang, Pengzhi Gao, Wei Liu, Jian Luan, Bin Wang, Bo An

The Chain of Action-Planning Thoughts (CoaT) paradigm has been shown to improve the reasoning performance of VLM-based mobile agents in GUI tasks.

Continual Pretraining

Mobile-Bench-v2: A More Realistic and Comprehensive Benchmark for VLM-based Mobile Agents

no code implementations17 May 2025 Weikai Xu, Zhizheng Jiang, Yuxuan Liu, Wei Liu, Jian Luan, Yuanchun Li, Yunxin Liu, Bin Wang, Bo An

Additionally, both types of benchmarks fail to assess whether mobile agents can handle noise or engage in proactive interactions due to a lack of noisy apps or overly full instructions during the evaluation process.

Group-in-Group Policy Optimization for LLM Agent Training

1 code implementation16 May 2025 Lang Feng, Zhenghai Xue, Tingcong Liu, Bo An

In this work, we propose Group-in-Group Policy Optimization (GiGPO), a novel RL algorithm that achieves fine-grained credit assignment for LLM agents while preserving the appealing properties of group-based RL: critic-free, low memory, and stable convergence.

Mathematical Reasoning Reinforcement Learning (RL)

Establishing Linear Surrogate Regret Bounds for Convex Smooth Losses via Convolutional Fenchel-Young Losses

no code implementations14 May 2025 Yuzhou Cao, Han Bao, Lei Feng, Bo An

While convex smooth surrogate losses are appealing in particular due to the efficient estimation and optimization, the existence of a trade-off between the smoothness and linear regret bound has been believed in the community.

MF-LLM: Simulating Population Decision Dynamics via a Mean-Field Large Language Model Framework

no code implementations30 Apr 2025 Qirui Mi, Mengyue Yang, Xiangning Yu, Zhiyu Zhao, Cheng Deng, Bo An, Haifeng Zhang, Xu Chen, Jun Wang

To improve alignment with real-world data, we introduce IB-Tune, a novel fine-tuning method inspired by the Information Bottleneck principle, which retains population signals most predictive of future actions while filtering redundant history.

Decision Making Language Modeling +2

Guiding VLM Agents with Process Rewards at Inference Time for GUI Navigation

no code implementations22 Apr 2025 Zhiyuan Hu, Shiyun Xiong, Yifan Zhang, See-Kiong Ng, Anh Tuan Luu, Bo An, Shuicheng Yan, Bryan Hooi

Recent advancements in visual language models (VLMs) have notably enhanced their capabilities in handling complex Graphical User Interface (GUI) interaction tasks.

A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment

no code implementations22 Apr 2025 Kun Wang, Guibin Zhang, Zhenhong Zhou, Jiahao Wu, Miao Yu, Shiqian Zhao, Chenlong Yin, Jinhu Fu, Yibo Yan, Hanjun Luo, Liang Lin, Zhihao Xu, Haolang Lu, Xinye Cao, Xinyun Zhou, Weifei Jin, Fanci Meng, Shicheng Xu, Junyuan Mao, Yu Wang, Hao Wu, Minghe Wang, Fan Zhang, Junfeng Fang, Wenjie Qu, Yue Liu, Chengwei Liu, Yifan Zhang, Qiankun Li, Chongye Guo, Yalan Qin, Zhaoxin Fan, Kai Wang, Yi Ding, Donghai Hong, Jiaming Ji, Yingxin Lai, Zitong Yu, Xinfeng Li, Yifan Jiang, Yanhui Li, Xinyu Deng, Junlin Wu, Dongxia Wang, Yihao Huang, Yufei Guo, Jen-tse Huang, Qiufeng Wang, Xiaolong Jin, Wenxuan Wang, Dongrui Liu, Yanwei Yue, Wenke Huang, Guancheng Wan, Heng Chang, Tianlin Li, Yi Yu, Chenghao Li, Jiawei Li, Lei Bai, Jie Zhang, Qing Guo, Jingyi Wang, Tianlong Chen, Joey Tianyi Zhou, Xiaojun Jia, Weisong Sun, Cong Wu, Jing Chen, Xuming Hu, Yiming Li, Xiao Wang, Ningyu Zhang, Luu Anh Tuan, Guowen Xu, Jiaheng Zhang, Tianwei Zhang, Xingjun Ma, Jindong Gu, Liang Pang, Xiang Wang, Bo An, Jun Sun, Mohit Bansal, Shirui Pan, Lingjuan Lyu, Yuval Elovici, Bhavya Kailkhura, Yaodong Yang, Hongwei Li, Wenyuan Xu, Yizhou Sun, Wei Wang, Qing Li, Ke Tang, Yu-Gang Jiang, Felix Juefei-Xu, Hui Xiong, XiaoFeng Wang, DaCheng Tao, Philip S. Yu, Qingsong Wen, Yang Liu

Currently, existing surveys on LLM safety primarily focus on specific stages of the LLM lifecycle, e. g., deployment phase or fine-tuning phase, lacking a comprehensive understanding of the entire "lifechain" of LLMs.

Model Editing

Generative Auto-Bidding with Value-Guided Explorations

1 code implementation20 Apr 2025 Jingtong Gao, Yewen Li, Shuai Mao, Nan Jiang, Yejing Wang, Qingpeng Cai, Fei Pan, Peng Jiang, Kun Gai, Bo An, Xiangyu Zhao

Auto-bidding, with its strong capability to optimize bidding decisions within dynamic and competitive online environments, has become a pivotal strategy for advertising platforms.

Reinforcement Learning (RL)

Policy Regularization on Globally Accessible States in Cross-Dynamics Reinforcement Learning

no code implementations10 Mar 2025 Zhenghai Xue, Lang Feng, Jiacheng Xu, Kang Kang, Xiang Wen, Bo An, Shuicheng Yan

Additionally, as the environment dynamics change, certain expert states may become inaccessible, rendering their distributions less valuable for imitation.

Imitation Learning Offline RL

LongSpec: Long-Context Speculative Decoding with Efficient Drafting and Verification

1 code implementation24 Feb 2025 Penghui Yang, Cunxiao Du, Fengzhuo Zhang, Haonan Wang, Tianyu Pang, Chao Du, Bo An

Despite its promise, the effective application of speculative decoding in LLMs still confronts three key challenges: the increasing memory demands of the draft model, the distribution shift between the short-training corpora and long-context inference, and inefficiencies in attention implementation.

Code Completion

Solving Urban Network Security Games: Learning Platform, Benchmark, and Challenge for AI Research

no code implementations29 Jan 2025 Shuxin Zhuang, Shuxin Li, Tianji Yang, Muheng Li, Xianjie Shi, Bo An, Youzhi Zhang

To facilitate the development of designing efficient learning algorithms for solving multiplayer games, we propose a multiplayer game platform for solving Urban Network Security Games (\textbf{UNSG}) that model real-world scenarios.

Benchmarking

OpticalNet: An Optical Imaging Dataset and Benchmark Beyond the Diffraction Limit

no code implementations CVPR 2025 Benquan Wang, Ruyi An, Jin-Kyu So, Sergei Kurdiumov, Eng Aik Chan, Giorgio Adamo, Yuhan Peng, Yewen Li, Bo An

Experimental results validate our "building block" concept, demonstrating that models trained on basic square units can effectively generalize to realistic, more complex unseen objects.

Image-to-Image Translation

GAS: Generative Auto-bidding with Post-training Search

no code implementations22 Dec 2024 Yewen Li, Shuai Mao, Jingtong Gao, Nan Jiang, Yunjian Xu, Qingpeng Cai, Fei Pan, Peng Jiang, Bo An

We use weak-to-strong search alignment by training small critics for different preferences and an MCTS-inspired search to refine the model's output.

Computational Efficiency Sequential Decision Making

Mars-PO: Multi-Agent Reasoning System Preference Optimization

no code implementations28 Nov 2024 Xiaoxuan Lou, Chaojie Wang, Bo An

Mathematical reasoning is a fundamental capability for large language models (LLMs), yet achieving high performance in this domain remains a significant challenge.

Math Mathematical Reasoning

Dual-Head Knowledge Distillation: Enhancing Logits Utilization with an Auxiliary Head

no code implementations13 Nov 2024 Penghui Yang, Chen-Chen Zong, Sheng-Jun Huang, Lei Feng, Bo An

Drawing from the theoretical analysis, we propose a novel method called dual-head knowledge distillation, which partitions the linear classifier into two classification heads responsible for different losses, thereby preserving the beneficial effects of both losses on the backbone while eliminating adverse influences on the classification head.

Attribute Knowledge Distillation

MedGo: A Chinese Medical Large Language Model

no code implementations27 Oct 2024 HaiTao Zhang, Bo An

Large models are a hot research topic in the field of artificial intelligence.

Language Modeling Language Modelling +3

Double Oracle Neural Architecture Search for Game Theoretic Deep Learning Models

no code implementations7 Oct 2024 Aye Phyu Phyu Aung, Xinrun Wang, Ruiyu Wang, Hau Chan, Bo An, XiaoLi Li, J. Senthilnath

In this paper, we propose a new approach to train deep learning models using game theory concepts including Generative Adversarial Networks (GANs) and Adversarial Training (AT) where we deploy a double-oracle framework using best response oracles.

Neural Architecture Search

Computing Ex Ante Equilibrium in Heterogeneous Zero-Sum Team Games

no code implementations2 Oct 2024 Naming Liu, Mingzhi Wang, Xihuai Wang, Weinan Zhang, Yaodong Yang, Youzhi Zhang, Bo An, Ying Wen

Such insufficient policy expressiveness causes Team PSRO to be trapped into a sub-optimal ex ante equilibrium with significantly higher exploitability and never converges to the global ex ante equilibrium.

Deep Graph Anomaly Detection: A Survey and New Perspectives

1 code implementation16 Sep 2024 Hezhe Qiao, Hanghang Tong, Bo An, Irwin King, Charu Aggarwal, Guansong Pang

To this end, in this work we aim to present a comprehensive review of deep learning approaches for GAD.

Graph Anomaly Detection Survey

Resultant: Incremental Effectiveness on Likelihood for Unsupervised Out-of-Distribution Detection

no code implementations5 Sep 2024 Yewen Li, Chaojie Wang, Xiaobo Xia, Xu He, Ruyi An, Dong Li, Tongliang Liu, Bo An, Xinrun Wang

Therefore, we appeal for more attention to incremental effectiveness on likelihood, i. e., whether a method could always surpass or at least match the performance of likelihood in U-OOD detection.

Out-of-Distribution Detection

In-Context Exploiter for Extensive-Form Games

no code implementations10 Aug 2024 Shuxin Li, Chang Yang, Youzhi Zhang, Pengdeng Li, Xinrun Wang, Xiao Huang, Hau Chan, Bo An

Nash equilibrium (NE) is a widely adopted solution concept in game theory due to its stability property.

Form In-Context Learning

Q-Adapter: Customizing Pre-trained LLMs to New Preferences with Forgetting Mitigation

1 code implementation4 Jul 2024 Yi-Chen Li, Fuxiang Zhang, Wenjie Qiu, Lei Yuan, Chengxing Jia, Zongzhang Zhang, Yang Yu, Bo An

Thanks to the residual Q-learning framework, we can restore the customized LLM with the pre-trained LLM and the \emph{residual Q-function} without the reward function $r_1$.

Q-Learning reinforcement-learning +2

MacroHFT: Memory Augmented Context-aware Reinforcement Learning On High Frequency Trading

1 code implementation20 Jun 2024 Chuqiao Zong, Chaojie Wang, Molei Qin, Lei Feng, Xinrun Wang, Bo An

To tackle these problems, we propose a novel Memory Augmented Context-aware Reinforcement learning method On HFT, \emph{a. k. a.}

Algorithmic Trading Decision Making +5

Latent Logic Tree Extraction for Event Sequence Explanation from LLMs

no code implementations3 Jun 2024 Zitao Song, Chao Yang, Chaojie Wang, Bo An, Shuang Li

In the E-step, we evaluate the posterior distribution over the latent logic trees using an LLM prior and the likelihood of the observed event sequences.

MANO: Exploiting Matrix Norm for Unsupervised Accuracy Estimation Under Distribution Shifts

1 code implementation29 May 2024 Renchunzi Xie, Ambroise Odonnat, Vasilii Feofanov, Weijian Deng, Jianfeng Zhang, Bo An

Our findings motivate our proposed method MaNo which (1) applies a data-dependent normalization on the logits to reduce prediction bias, and (2) takes the $L_p$ norm of the matrix of normalized logits as the estimation score.

Computational Efficiency

Resisting Stochastic Risks in Diffusion Planners with the Trajectory Aggregation Tree

1 code implementation28 May 2024 Lang Feng, Pengjie Gu, Bo An, Gang Pan

As the structure evolves with the integration of new trajectories, unreliable states are marginalized, and the most impactful nodes are prioritized for decision-making.

Decision Making

Configurable Mirror Descent: Towards a Unification of Decision Making

1 code implementation20 May 2024 Pengdeng Li, Shuxin Li, Chang Yang, Xinrun Wang, Shuyue Hu, Xiao Huang, Hau Chan, Bo An

Decision-making problems, categorized as single-agent, e. g., Atari, cooperative multi-agent, e. g., Hanabi, competitive multi-agent, e. g., Hold'em poker, and mixed cooperative and competitive, e. g., football, are ubiquitous in the real world.

Decision Making

Grasper: A Generalist Pursuer for Pursuit-Evasion Problems

1 code implementation19 Apr 2024 Pengdeng Li, Shuxin Li, Xinrun Wang, Jakub Cerny, Youzhi Zhang, Stephen Mcaleer, Hau Chan, Bo An

Pursuit-evasion games (PEGs) model interactions between a team of pursuers and an evader in graph-based environments such as urban street networks.

Graph Learning Graph Neural Network

Self-adaptive PSRO: Towards an Automatic Population-based Game Solver

no code implementations17 Apr 2024 Pengdeng Li, Shuxin Li, Chang Yang, Xinrun Wang, Xiao Huang, Hau Chan, Bo An

(2) We propose the self-adaptive PSRO (SPSRO) by casting the hyperparameter value selection of the parametric PSRO as a hyperparameter optimization (HPO) problem where our objective is to learn an HPO policy that can self-adaptively determine the optimal hyperparameter values during the running of the parametric PSRO.

Hyperparameter Optimization

AgentStudio: A Toolkit for Building General Virtual Agents

1 code implementation26 Mar 2024 Longtao Zheng, Zhiyuan Huang, Zhenghai Xue, Xinrun Wang, Bo An, Shuicheng Yan

General virtual agents need to handle multimodal observations, master complex action spaces, and self-improve in dynamic, open-domain environments.

Visual Grounding

Cradle: Empowering Foundation Agents Towards General Computer Control

1 code implementation5 Mar 2024 Weihao Tan, Wentao Zhang, Xinrun Xu, Haochong Xia, Ziluo Ding, Boyu Li, Bohan Zhou, Junpeng Yue, Jiechuan Jiang, Yewen Li, Ruyi An, Molei Qin, Chuqiao Zong, Longtao Zheng, Yujie Wu, Xiaoqiang Chai, Yifei Bi, Tianbao Xie, Pengjie Gu, Xiyun Li, Ceyao Zhang, Long Tian, Chaojie Wang, Xinrun Wang, Börje F. Karlsson, Bo An, Shuicheng Yan, Zongqing Lu

To handle this issue, we propose the General Computer Control (GCC) setting to restrict foundation agents to interact with software through the most unified and standardized interface, i. e., using screenshots as input and keyboard and mouse actions as output.

Efficient Exploration

True Knowledge Comes from Practice: Aligning LLMs with Embodied Environments via Reinforcement Learning

1 code implementation25 Jan 2024 Weihao Tan, Wentao Zhang, Shanqi Liu, Longtao Zheng, Xinrun Wang, Bo An

Despite the impressive performance across numerous tasks, large language models (LLMs) often fail in solving simple decision-making tasks due to the misalignment of the knowledge in LLMs with environments.

Decision Making Reinforcement Learning (RL)

Debiased Sample Selection for Combating Noisy Labels

1 code implementation24 Jan 2024 Qi Wei, Lei Feng, Haobo Wang, Bo An

To address this limitation, we propose a noIse-Tolerant Expert Model (ITEM) for debiased learning in sample selection.

Learning with noisy labels

Leveraging Gradients for Unsupervised Accuracy Estimation under Distribution Shift

1 code implementation17 Jan 2024 Renchunzi Xie, Ambroise Odonnat, Vasilii Feofanov, Ievgen Redko, Jianfeng Zhang, Bo An

Estimating the test performance of a model, possibly under distribution shift, without having access to the ground-truth labels is a challenging, yet very important problem for the safe deployment of machine learning algorithms in the wild.

Improving Unsupervised Hierarchical Representation with Reinforcement Learning

1 code implementation CVPR 2024 Ruyi An, Yewen Li, Xu He, Pengjie Gu, Mengchen Zhao, Dong Li, Jianye Hao, Chaojie Wang, Bo An, Mingyuan Zhou

To address this issue we first analyze the shortcomings of existing methods for mitigating the "posterior collapse" from an information theory perspective then highlight the necessity of regularization for explicitly propagating data information to higher-level latent variables while maintaining the dependency between different levels.

reinforcement-learning Reinforcement Learning +1

keqing: knowledge-based question answering is a nature chain-of-thought mentor of LLM

no code implementations31 Dec 2023 Chaojie Wang, Yishi Xu, Zhong Peng, Chenxi Zhang, Bo Chen, Xinrun Wang, Lei Feng, Bo An

Large language models (LLMs) have exhibited remarkable performance on various natural language processing (NLP) tasks, especially for question answering.

Information Retrieval Question Answering +1

Reinforcement Learning with Maskable Stock Representation for Portfolio Management in Customizable Stock Pools

1 code implementation17 Nov 2023 Wentao Zhang, Yilei Zhao, Shuo Sun, Jie Ying, Yonggang Xie, Zitao Song, Xinrun Wang, Bo An

Specifically, the target stock pool of different investors varies dramatically due to their discrepancy on market states and individual investors may temporally adjust stocks they desire to trade (e. g., adding one popular stocks), which lead to customizable stock pools (CSPs).

Management reinforcement-learning +1

AURO: Reinforcement Learning for Adaptive User Retention Optimization in Recommender Systems

no code implementations6 Oct 2023 Zhenghai Xue, Qingpeng Cai, Bin Yang, Lantao Hu, Peng Jiang, Kun Gai, Bo An

As the policy performance of RL is sensitive to environment drifts, the loss function enables the state abstraction to be reflective of environment changes and notify the recommendation policy to adapt accordingly.

Navigate Reinforcement Learning (RL) +1

EarnHFT: Efficient Hierarchical Reinforcement Learning for High Frequency Trading

1 code implementation22 Sep 2023 Molei Qin, Shuo Sun, Wentao Zhang, Haochong Xia, Xinrun Wang, Bo An

In stage II, we construct a pool of diverse RL agents for different market trends, distinguished by return rates, where hundreds of RL agents are trained with different preferences of return rates and only a tiny fraction of them will be selected into the pool based on their profitability.

Algorithmic Trading Hierarchical Reinforcement Learning +1

Market-GAN: Adding Control to Financial Market Data Generation with Semantic Context

1 code implementation14 Sep 2023 Haochong Xia, Shuo Sun, Xinrun Wang, Bo An

Financial simulators play an important role in enhancing forecasting accuracy, managing risks, and fostering strategic financial decision-making.

Stock Market Prediction text-guided-generation +1

Efficient Last-iterate Convergence Algorithms in Solving Games

no code implementations22 Aug 2023 Linjian Meng, Youzhi Zhang, Zhenxing Ge, Shangdong Yang, Tianyu Ding, Wenbin Li, Tianpei Yang, Bo An, Yang Gao

To establish last-iterate convergence for Counterfactual Regret Minimization (CFR) algorithms in learning a Nash equilibrium (NE) of extensive-form games (EFGs), recent studies reformulate learning an NE of the original EFG as learning the NEs of a sequence of (perturbed) regularized EFGs.

counterfactual

Weakly Supervised Regression with Interval Targets

no code implementations18 Jun 2023 Xin Cheng, Yuzhou Cao, Ximing Li, Bo An, Lei Feng

Third, we propose a statistically consistent limiting method for RIT to train the model by limiting the predictions to the interval.

regression

Partial-Label Regression

1 code implementation AAAI 2023 Xin Cheng, Deng-Bao Wang, Lei Feng, Min-Ling Zhang, Bo An

Our proposed methods are theoretically grounded and can be compatible with any models, optimizers, and losses.

Partial Label Learning regression +1

Synapse: Trajectory-as-Exemplar Prompting with Memory for Computer Control

1 code implementation13 Jun 2023 Longtao Zheng, Rundong Wang, Xinrun Wang, Bo An

To address these challenges, we introduce Synapse, a computer agent featuring three key components: i) state abstraction, which filters out task-irrelevant information from raw states, allowing more exemplars within the limited context, ii) trajectory-as-exemplar prompting, which prompts the LLM with complete trajectories of the abstracted states and actions to improve multi-step decision-making, and iii) exemplar memory, which stores the embeddings of exemplars and retrieves them via similarity search for generalization to novel tasks.

Decision Making In-Context Learning +1

Towards Skilled Population Curriculum for Multi-Agent Reinforcement Learning

no code implementations7 Feb 2023 Rundong Wang, Longtao Zheng, Wei Qiu, Bowei He, Bo An, Zinovi Rabinovich, Yujing Hu, Yingfeng Chen, Tangjie Lv, Changjie Fan

Despite its success, ACL's applicability is limited by (1) the lack of a general student framework for dealing with the varying number of agents across tasks and the sparse reward problem, and (2) the non-stationarity of the teacher's task due to ever-changing student strategies.

Multi-agent Reinforcement Learning reinforcement-learning +2

Population-size-Aware Policy Optimization for Mean-Field Games

no code implementations7 Feb 2023 Pengdeng Li, Xinrun Wang, Shuxin Li, Hau Chan, Bo An

In this work, we attempt to bridge the two fields of finite-agent and infinite-agent games, by studying how the optimal policies of agents evolve with the number of agents (population size) in mean-field games, an agent-centric perspective in contrast to the existing works focusing typically on the convergence of the empirical distribution of the population.

Attacking Cooperative Multi-Agent Reinforcement Learning by Adversarial Minority Influence

1 code implementation7 Feb 2023 Simin Li, Jun Guo, Jingqiao Xiu, Yuwei Zheng, Pu Feng, Xin Yu, Aishan Liu, Yaodong Yang, Bo An, Wenjun Wu, Xianglong Liu

To achieve maximum deviation in victim policies under complex agent-wise interactions, our unilateral attack aims to characterize and maximize the impact of the adversary on the victims.

Continuous Control MuJoCo +6

Reinforcement Learning from Diverse Human Preferences

no code implementations27 Jan 2023 Wanqi Xue, Bo An, Shuicheng Yan, Zhongwen Xu

The complexity of designing reward functions has been a major obstacle to the wide application of deep reinforcement learning (RL) techniques.

Deep Reinforcement Learning reinforcement-learning +1

PRUDEX-Compass: Towards Systematic Evaluation of Reinforcement Learning in Financial Markets

no code implementations14 Jan 2023 Shuo Sun, Molei Qin, Xinrun Wang, Bo An

Specifically, i) we propose AlphaMix+ as a strong FinRL baseline, which leverages mixture-of-experts (MoE) and risk-sensitive approaches to make diversified risk-aware investment decisions, ii) we evaluate 8 FinRL methods in 4 long-term real-world datasets of influential financial markets to demonstrate the usage of our PRUDEX-Compass, iii) PRUDEX-Compass together with 4 real-world datasets, standard implementation of 8 FinRL methods and a portfolio management environment is released as public resources to facilitate the design and comparison of new FinRL methods.

Management Mixture-of-Experts +2

Mitigating Memorization of Noisy Labels by Clipping the Model Prediction

no code implementations8 Dec 2022 Hongxin Wei, Huiping Zhuang, Renchunzi Xie, Lei Feng, Gang Niu, Bo An, Yixuan Li

In the presence of noisy labels, designing robust loss functions is critical for securing the generalization performance of deep neural networks.

Memorization

PrefRec: Recommender Systems with Human Preferences for Reinforcing Long-term User Engagement

1 code implementation6 Dec 2022 Wanqi Xue, Qingpeng Cai, Zhenghai Xue, Shuo Sun, Shuchang Liu, Dong Zheng, Peng Jiang, Kun Gai, Bo An

Though promising, the application of RL heavily relies on well-designed rewards, but designing rewards related to long-term user engagement is quite difficult.

Recommendation Systems Reinforcement Learning (RL)

Generalized Consistent Multi-Class Classification with Rejection to be Compatible with Arbitrary Losses

2 code implementations Conference 2022 Yuzhou Cao, Tianchi Cai, Lei Feng, Lihong Gu, Jinjie Gu, Bo An, Gang Niu, Masashi Sugiyama

\emph{Classification with rejection} (CwR) refrains from making a prediction to avoid critical misclassification when encountering test samples that are difficult to classify.

Multi-class Classification

Classifying Ambiguous Identities in Hidden-Role Stochastic Games with Multi-Agent Reinforcement Learning

1 code implementation24 Oct 2022 Shijie Han, Siyuan Li, Bo An, Wei Zhao, Peng Liu

In this work, we develop a novel identity detection reinforcement learning (IDRL) framework that allows an agent to dynamically infer the identities of nearby agents and select an appropriate policy to accomplish the task.

Multi-agent Reinforcement Learning reinforcement-learning +2

RPM: Generalizable Behaviors for Multi-Agent Reinforcement Learning

no code implementations18 Oct 2022 Wei Qiu, Xiao Ma, Bo An, Svetlana Obraztsova, Shuicheng Yan, Zhongwen Xu

Despite the recent advancement in multi-agent reinforcement learning (MARL), the MARL agents easily overfit the training environment and perform poorly in the evaluation scenarios where other agents behave differently.

Multi-agent Reinforcement Learning reinforcement-learning +2

Deep Attentive Belief Propagation: Integrating Reasoning and Learning for Solving Constraint Optimization Problems

no code implementations24 Sep 2022 Yanchen Deng, Shufeng Kong, Caihua Liu, Bo An

Belief Propagation (BP) is an important message-passing algorithm for various reasoning tasks over graphical models, including solving the Constraint Optimization Problems (COPs).

Graph Attention Self-Supervised Learning

Offline Equilibrium Finding

1 code implementation12 Jul 2022 Shuxin Li, Xinrun Wang, Youzhi Zhang, Jakub Cerny, Pengdeng Li, Hau Chan, Bo An

Extensive experimental results demonstrate the superiority of our approach over offline RL algorithms and the importance of using model-based methods for OEF problems.

Offline RL

Open-Sampling: Exploring Out-of-Distribution data for Re-balancing Long-tailed datasets

3 code implementations17 Jun 2022 Hongxin Wei, Lue Tao, Renchunzi Xie, Lei Feng, Bo An

Deep neural networks usually perform poorly when the training dataset suffers from extreme class imbalance.

ResAct: Reinforcing Long-term Engagement in Sequential Recommendation with Residual Actor

1 code implementation1 Jun 2022 Wanqi Xue, Qingpeng Cai, Ruohan Zhan, Dong Zheng, Peng Jiang, Kun Gai, Bo An

Meanwhile, reinforcement learning (RL) is widely regarded as a promising framework for optimizing long-term engagement in sequential recommendation.

Reinforcement Learning (RL) Sequential Recommendation

Mitigating Neural Network Overconfidence with Logit Normalization

2 code implementations19 May 2022 Hongxin Wei, Renchunzi Xie, Hao Cheng, Lei Feng, Bo An, Yixuan Li

Our method is motivated by the analysis that the norm of the logit keeps increasing during training, leading to overconfident output.

NSGZero: Efficiently Learning Non-Exploitable Policy in Large-Scale Network Security Games with Neural Monte Carlo Tree Search

no code implementations17 Jan 2022 Wanqi Xue, Bo An, Chai Kiat Yeo

Second, we enable neural MCTS with decentralized control, making NSGZero applicable to NSGs with many resources.

GearNet: Stepwise Dual Learning for Weakly Supervised Domain Adaptation

3 code implementations16 Jan 2022 Renchunzi Xie, Hongxin Wei, Lei Feng, Bo An

Although there have been a few studies on this problem, most of them only exploit unidirectional relationships from the source domain to the target domain.

Domain Adaptation

DeepScalper: A Risk-Aware Reinforcement Learning Framework to Capture Fleeting Intraday Trading Opportunities

no code implementations15 Dec 2021 Shuo Sun, Wanqi Xue, Rundong Wang, Xu He, Junlei Zhu, Jian Li, Bo An

Reinforcement learning (RL) techniques have shown great success in many challenging quantitative trading tasks, such as portfolio management and algorithmic trading.

Algorithmic Trading Decision Making +4

Pretrained Cost Model for Distributed Constraint Optimization Problems

1 code implementation8 Dec 2021 Yanchen Deng, Shufeng Kong, Bo An

Our model, GAT-PCM, is then pretrained with optimally labelled data in an offline manner, so as to construct effective heuristics to boost a broad range of DCOP algorithms where evaluating the quality of a partial assignment is critical, such as local search or backtracking search.

Combinatorial Optimization Graph Attention +1

RMIX: Learning Risk-Sensitive Policies forCooperative Reinforcement Learning Agents

no code implementations NeurIPS 2021 Wei Qiu, Xinrun Wang, Runsheng Yu, Rundong Wang, Xu He, Bo An, Svetlana Obraztsova, Zinovi Rabinovich

Current value-based multi-agent reinforcement learning methods optimize individual Q values to guide individuals' behaviours via centralized training with decentralized execution (CTDE).

Multi-agent Reinforcement Learning quantile regression +5

Open-sampling: Re-balancing Long-tailed Datasets with Out-of-Distribution Data

no code implementations29 Sep 2021 Hongxin Wei, Lue Tao, Renchunzi Xie, Lei Feng, Bo An

Deep neural networks usually perform poorly when the training dataset suffers from extreme class imbalance.

Learning Pseudometric-based Action Representations for Offline Reinforcement Learning

no code implementations29 Sep 2021 Pengjie Gu, Mengchen Zhao, Chen Chen, Dong Li, Jianye Hao, Bo An

Offline reinforcement learning is a promising approach for practical applications since it does not require interactions with real-world environments.

Offline RL Recommendation Systems +5

Online Ad Hoc Teamwork under Partial Observability

no code implementations ICLR 2022 Pengjie Gu, Mengchen Zhao, Jianye Hao, Bo An

Autonomous agents often need to work together as a team to accomplish complex cooperative tasks.

Reinforcement Learning for Quantitative Trading

no code implementations28 Sep 2021 Shuo Sun, Rundong Wang, Bo An

RL's impact is pervasive, recently demonstrating its ability to conquer many challenging QT tasks.

Decision Making reinforcement-learning +3

Mis-spoke or mis-lead: Achieving Robustness in Multi-Agent Communicative Reinforcement Learning

no code implementations9 Aug 2021 Wanqi Xue, Wei Qiu, Bo An, Zinovi Rabinovich, Svetlana Obraztsova, Chai Kiat Yeo

Empirical results demonstrate that many state-of-the-art MACRL methods are vulnerable to message attacks, and our method can significantly improve their robustness.

Multi-agent Reinforcement Learning reinforcement-learning +2

Multi-Class Classification from Single-Class Data with Confidences

no code implementations16 Jun 2021 Yuzhou Cao, Lei Feng, Senlin Shu, Yitian Xu, Bo An, Gang Niu, Masashi Sugiyama

We show that without any assumptions on the loss functions, models, and optimizers, we can successfully learn a multi-class classifier from only data of a single class with a rigorous consistency guarantee when confidences (i. e., the class-posterior probabilities for all the classes) are available.

Multi-class Classification

Contingency-Aware Influence Maximization: A Reinforcement Learning Approach

1 code implementation13 Jun 2021 Haipeng Chen, Wei Qiu, Han-Ching Ou, Bo An, Milind Tambe

Empirical results show that our method achieves influence as high as the state-of-the-art methods for contingency-aware IM, while having negligible runtime at test phase.

Combinatorial Optimization reinforcement-learning +2

On the Robustness of Average Losses for Partial-Label Learning

no code implementations11 Jun 2021 Jiaqi Lv, Biao Liu, Lei Feng, Ning Xu, Miao Xu, Bo An, Gang Niu, Xin Geng, Masashi Sugiyama

Partial-label learning (PLL) utilizes instances with PLs, where a PL includes several candidate labels but only one is the true label (TL).

Partial Label Learning Weakly Supervised Classification

CFR-MIX: Solving Imperfect Information Extensive-Form Games with Combinatorial Action Space

no code implementations18 May 2021 Shuxin Li, Youzhi Zhang, Xinrun Wang, Wanqi Xue, Bo An

The challenge of solving this type of game is that the team's joint action space grows exponentially with the number of agents, which results in the inefficiency of the existing algorithms, e. g., Counterfactual Regret Minimization (CFR).

counterfactual Form

L2E: Learning to Exploit Your Opponent

no code implementations18 Feb 2021 Zhe Wu, Kai Li, Enmin Zhao, Hang Xu, Meng Zhang, Haobo Fu, Bo An, Junliang Xing

In this work, we propose a novel Learning to Exploit (L2E) framework for implicit opponent modeling.

DO-GAN: A Double Oracle Framework for Generative Adversarial Networks

no code implementations CVPR 2022 Aye Phyu Phyu Aung, Xinrun Wang, Runsheng Yu, Bo An, Senthilnath Jayavelu, XiaoLi Li

In this paper, we propose a new approach to train Generative Adversarial Networks (GANs) where we deploy a double-oracle framework using the generator and discriminator oracles.

Continual Learning

RMIX: Learning Risk-Sensitive Policies for Cooperative Reinforcement Learning Agents

no code implementations16 Feb 2021 Wei Qiu, Xinrun Wang, Runsheng Yu, Xu He, Rundong Wang, Bo An, Svetlana Obraztsova, Zinovi Rabinovich

Current value-based multi-agent reinforcement learning methods optimize individual Q values to guide individuals' behaviours via centralized training with decentralized execution (CTDE).

Multi-agent Reinforcement Learning quantile regression +5

Learning from Similarity-Confidence Data

no code implementations13 Feb 2021 Yuzhou Cao, Lei Feng, Yitian Xu, Bo An, Gang Niu, Masashi Sugiyama

Weakly supervised learning has drawn considerable attention recently to reduce the expensive time and labor consumption of labeling massive data.

Weakly-supervised Learning

Safe Coupled Deep Q-Learning for Recommendation Systems

no code implementations8 Jan 2021 Runsheng Yu, Yu Gong, Rundong Wang, Bo An, Qingwen Liu, Wenwu Ou

Firstly, we introduce a novel training scheme with two value functions to maximize the accumulated long-term reward under the safety constraint.

Q-Learning Recommendation Systems +1

RMIX: Risk-Sensitive Multi-Agent Reinforcement Learning

no code implementations1 Jan 2021 Wei Qiu, Xinrun Wang, Runsheng Yu, Xu He, Rundong Wang, Bo An, Svetlana Obraztsova, Zinovi Rabinovich

Centralized training with decentralized execution (CTDE) has become an important paradigm in multi-agent reinforcement learning (MARL).

Multi-agent Reinforcement Learning reinforcement-learning +4

Deep Stock Trading: A Hierarchical Reinforcement Learning Framework for Portfolio Optimization and Order Execution

no code implementations23 Dec 2020 Rundong Wang, Hongxin Wei, Bo An, Zhouyan Feng, Jun Yao

Portfolio management via reinforcement learning is at the forefront of fintech research, which explores how to optimally reallocate a fund into different financial assets over the long term by trial-and-error.

Hierarchical Reinforcement Learning Management +2

Personalized Adaptive Meta Learning for Cold-start User Preference Prediction

no code implementations22 Dec 2020 Runsheng Yu, Yu Gong, Xu He, Bo An, Yu Zhu, Qingwen Liu, Wenwu Ou

Recently, many existing studies regard the cold-start personalized preference prediction as a few-shot learning problem, where each user is the task and recommended items are the classes, and the gradient-based meta learning method (MAML) is leveraged to address this challenge.

Few-Shot Learning

MetaInfoNet: Learning Task-Guided Information for Sample Reweighting

no code implementations9 Dec 2020 Hongxin Wei, Lei Feng, Rundong Wang, Bo An

Deep neural networks have been shown to easily overfit to biased training data with label noise or class imbalance.

Meta-Learning

SemiNLL: A Framework of Noisy-Label Learning by Semi-Supervised Learning

no code implementations2 Dec 2020 Zhuowei Wang, Jing Jiang, Bo Han, Lei Feng, Bo An, Gang Niu, Guodong Long

We also instantiate our framework with different combinations, which set the new state of the art on benchmark-simulated and real-world datasets with noisy labels.

Learning with noisy labels

Pointwise Binary Classification with Pairwise Confidence Comparisons

no code implementations5 Oct 2020 Lei Feng, Senlin Shu, Nan Lu, Bo Han, Miao Xu, Gang Niu, Bo An, Masashi Sugiyama

To alleviate the data requirement for training effective binary classifiers in binary classification, many weakly supervised learning settings have been proposed.

Binary Classification Classification +2

Complexity and Algorithms for Exploiting Quantal Opponents in Large Two-Player Games

no code implementations30 Sep 2020 David Milec, Jakub Černý, Viliam Lisý, Bo An

This paper aims to analyze and propose scalable algorithms for computing effective and robust strategies against a quantal opponent in normal-form and extensive-form games.

counterfactual

Contextual User Browsing Bandits for Large-Scale Online Mobile Recommendation

no code implementations21 Aug 2020 Xu He, Bo An, Yanghua Li, Haikai Chen, Qingyu Guo, Xin Li, Zhirong Wang

First, since we concern the reward of a set of recommended items, we model the online recommendation as a contextual combinatorial bandit problem and define the reward of a recommended set.

Provably Consistent Partial-Label Learning

no code implementations NeurIPS 2020 Lei Feng, Jiaqi Lv, Bo Han, Miao Xu, Gang Niu, Xin Geng, Bo An, Masashi Sugiyama

Partial-label learning (PLL) is a multi-class classification problem, where each training example is associated with a set of candidate labels.

Multi-class Classification Partial Label Learning

Learning Behaviors with Uncertain Human Feedback

1 code implementation7 Jun 2020 Xu He, Haipeng Chen, Bo An

However, previous works rarely consider the uncertainty when humans provide feedback, especially in cases that the optimal actions are not obvious to the trainers.

Learning Expensive Coordination: An Event-Based Deep RL Approach

no code implementations ICLR 2020 Zhenyu Shi*, Runsheng Yu*, Xinrun Wang*, Rundong Wang, Youzhi Zhang, Hanjiang Lai, Bo An

The main difficulties of expensive coordination are that i) the leader has to consider the long-term effect and predict the followers' behaviors when assigning bonuses and ii) the complex interactions between followers make the training process hard to converge, especially when the leader's policy changes with time.

Decision Making Multi-agent Reinforcement Learning

Combating noisy labels by agreement: A joint training method with co-regularization

2 code implementations CVPR 2020 Hongxin Wei, Lei Feng, Xiangyu Chen, Bo An

The state-of-the-art approaches "Decoupling" and "Co-teaching+" claim that the "disagreement" strategy is crucial for alleviating the problem of learning with noisy labels.

Diversity Learning with noisy labels +1

Learning with Multiple Complementary Labels

no code implementations ICML 2020 Lei Feng, Takuo Kaneko, Bo Han, Gang Niu, Bo An, Masashi Sugiyama

In this paper, we propose a novel problem setting to allow MCLs for each example and two ways for learning with MCLs.

Inducing Cooperation via Team Regret Minimization based Multi-Agent Deep Reinforcement Learning

no code implementations18 Nov 2019 Runsheng Yu, Zhenyu Shi, Xinrun Wang, Rundong Wang, Buhong Liu, Xinwen Hou, Hanjiang Lai, Bo An

Existing value-factorized based Multi-Agent deep Reinforce-ment Learning (MARL) approaches are well-performing invarious multi-agent cooperative environment under thecen-tralized training and decentralized execution(CTDE) scheme, where all agents are trained together by the centralized valuenetwork and each agent execute its policy independently.

Deep Reinforcement Learning reinforcement-learning +1

EUSP: An Easy-to-Use Semantic Parsing PlatForm

no code implementations IJCNLP 2019 Bo An, Chen Bo, Xianpei Han, Le Sun

Semantic parsing aims to map natural language utterances into structured meaning representations.

Semantic Parsing

Manipulating a Learning Defender and Ways to Counteract

no code implementations NeurIPS 2019 Jiarui Gan, Qingyu Guo, Long Tran-Thanh, Bo An, Michael Wooldridge

We then apply a game-theoretic framework at a higher level to counteract such manipulation, in which the defender commits to a policy that specifies her strategy commitment according to the learned information.

Competitive Bridge Bidding with Deep Neural Networks

no code implementations3 Mar 2019 Jiang Rong, Tao Qin, Bo An

Second, based on the analysis of the impact of other players' unknown cards on one's final rewards, we design two neural networks to deal with imperfect information, the first one inferring the cards of the partner and the second one taking the outputs of the first one as part of its input to select a bid.

Partial Label Learning with Self-Guided Retraining

no code implementations8 Feb 2019 Lei Feng, Bo An

We show that optimizing this convex-concave problem is equivalent to solving a set of quadratic programming (QP) problems.

Partial Label Learning

Collaboration based Multi-Label Learning

no code implementations8 Feb 2019 Lei Feng, Bo An, Shuo He

It is well-known that exploiting label correlations is crucially important to multi-label learning.

Multi-Label Learning

Sentence Rewriting for Semantic Parsing

no code implementations ACL 2016 Bo Chen, Le Sun, Xianpei Han, Bo An

A major challenge of semantic parsing is the vocabulary mismatch problem between natural language and target ontology.

Form Semantic Parsing +2

Model-Free Context-Aware Word Composition

no code implementations COLING 2018 Bo An, Xianpei Han, Le Sun

Word composition is a promising technique for representation learning of large linguistic units (e. g., phrases, sentences and documents).

Dimensionality Reduction Learning Word Embeddings +5

Accurate Text-Enhanced Knowledge Graph Representation Learning

no code implementations NAACL 2018 Bo An, Bo Chen, Xianpei Han, Le Sun

Previous representation learning techniques for knowledge graph representation usually represent the same entity or relation in different triples with the same representation, without considering the ambiguity of relations and entities.

General Classification Graph Representation Learning +4

Vehicle Traffic Driven Camera Placement for Better Metropolis Security Surveillance

1 code implementation1 Apr 2017 Yihui He, Xiaobo Ma, Xiapu Luo, Jianfeng Li, Mengchen Zhao, Bo An, Xiaohong Guan

Security surveillance is one of the most important issues in smart cities, especially in an era of terrorism.

Decision Making

Cannot find the paper you are looking for? You can Submit a new open access paper.