no code implementations • 21 Oct 2024 • Hanlin Yang, Jian Yao, Weiming Liu, Qing Wang, Hanmin Qin, Hansheng Kong, Kirk Tang, Jiechao Xiong, Chao Yu, Kai Li, Junliang Xing, Hongwu Chen, Juchao Zhuo, Qiang Fu, Yang Wei, Haobo Fu
Recovering a spectrum of diverse policies from a set of expert trajectories is an important research topic in imitation learning.
no code implementations • 1 Jan 2021 • Hao Sun, Ziping Xu, Meng Fang, Yuhang Song, Jiechao Xiong, Bo Dai, Zhengyou Zhang, Bolei Zhou
Despite the remarkable progress made by the policy gradient algorithms in reinforcement learning (RL), sub-optimal policies usually result from the local exploration property of the policy gradient update.
1 code implementation • 27 Nov 2020 • Lei Han, Jiechao Xiong, Peng Sun, Xinghai Sun, Meng Fang, Qingwei Guo, Qiaobo Chen, Tengfei Shi, Hongsheng Yu, Xipeng Wu, Zhengyou Zhang
We show that with orders of less computation scale, a faithful reimplementation of AlphaStar's methods can not succeed and the proposed techniques are necessary to ensure TStarBot-X's competitive performance.
1 code implementation • 25 Nov 2020 • Peng Sun, Jiechao Xiong, Lei Han, Xinghai Sun, Shuxing Li, Jiawei Xu, Meng Fang, Zhengyou Zhang
This poses non-trivial difficulties for researchers or engineers and prevents the application of MARL to a broader range of real-world problems.
no code implementations • 11 Jun 2020 • Hao Sun, Ziping Xu, Yuhang Song, Meng Fang, Jiechao Xiong, Bo Dai, Bolei Zhou
However, PG algorithms rely on exploiting the value function being learned with the first-order update locally, which results in limited sample efficiency.
1 code implementation • NeurIPS 2019 • Qing Wang, Yingru Li, Jiechao Xiong, Tong Zhang
In deep reinforcement learning, policy optimization methods need to deal with issues such as function approximation and the reuse of off-policy data.
2 code implementations • 20 Jul 2019 • Qing Wang, Jiechao Xiong, Lei Han, Meng Fang, Xinghai Sun, Zhuobin Zheng, Peng Sun, Zhengyou Zhang
We introduce Arena, a toolkit for multi-agent reinforcement learning (MARL) research.
1 code implementation • NeurIPS 2018 • Qing Wang, Jiechao Xiong, Lei Han, Peng Sun, Han Liu, Tong Zhang
We consider deep policy learning with only batched historical trajectories.
5 code implementations • 10 Oct 2018 • Jiechao Xiong, Qing Wang, Zhuoran Yang, Peng Sun, Lei Han, Yang Zheng, Haobo Fu, Tong Zhang, Ji Liu, Han Liu
Most existing deep reinforcement learning (DRL) frameworks consider either discrete action space or continuous action space solely.
2 code implementations • 19 Sep 2018 • Peng Sun, Xinghai Sun, Lei Han, Jiechao Xiong, Qing Wang, Bo Li, Yang Zheng, Ji Liu, Yongsheng Liu, Han Liu, Tong Zhang
Both TStarBot1 and TStarBot2 are able to defeat the built-in AI agents from level 1 to level 10 in a full game (1v1 Zerg-vs-Zerg game on the AbyssalReef map), noting that level 8, level 9, and level 10 are cheating agents with unfair advantages such as full vision on the whole map and resource harvest boosting.
no code implementations • 29 Jul 2018 • Qianqian Xu, Jiechao Xiong, Xinwei Sun, Zhiyong Yang, Xiaochun Cao, Qingming Huang, Yuan YAO
A preference order or ranking aggregated from pairwise comparison data is commonly understood as a strict total order.
no code implementations • 8 Mar 2018 • Qianqian Xu, Jiechao Xiong, Xiaochun Cao, Qingming Huang, Yuan YAO
In crowdsourced preference aggregation, it is often assumed that all the annotators are subject to a common preference or social utility function which generates their comparison behaviors in experiments.
1 code implementation • ICLR 2018 • Jiechao Xiong, Qing Wang, Zhuoran Yang, Peng Sun, Yang Zheng, Lei Han, Haobo Fu, Xiangru Lian, Carson Eisenach, Haichuan Yang, Emmanuel Ekwedike, Bei Peng, Haoyue Gao, Tong Zhang, Ji Liu, Han Liu
Most existing deep reinforcement learning (DRL) frameworks consider action spaces that are either discrete or continuous space.
1 code implementation • 17 Nov 2017 • Ke Ma, Jinshan Zeng, Jiechao Xiong, Qianqian Xu, Xiaochun Cao, Wei Liu, Yuan YAO
Learning representation from relative similarity comparisons, often called ordinal embedding, gains rising attention in recent years.
no code implementations • 16 Nov 2017 • Qianqian Xu, Jiechao Xiong, Xi Chen, Qingming Huang, Yuan YAO
Recently, crowdsourcing has emerged as an effective paradigm for human-powered large scale problem solving in various domains.
no code implementations • 18 Jul 2017 • Qianqian Xu, Ming Yan, Chendi Huang, Jiechao Xiong, Qingming Huang, Yuan YAO
Outlier detection is a crucial part of robust evaluation for crowdsourceable assessment of Quality of Experience (QoE) and has attracted much attention in recent years.
no code implementations • 16 Apr 2017 • Chendi Huang, Xinwei Sun, Jiechao Xiong, Yuan YAO
Boosting as gradient descent algorithms is one popular method in machine learning.
no code implementations • NeurIPS 2016 • Chendi Huang, Xinwei Sun, Jiechao Xiong, Yuan YAO
An iterative regularization path with structural sparsity is proposed in this paper based on variable splitting and the Linearized Bregman Iteration, hence called \emph{Split LBI}.
no code implementations • 12 Jul 2016 • Qianqian Xu, Jiechao Xiong, Xiaochun Cao, Yuan YAO
In crowdsourced preference aggregation, it is often assumed that all the annotators are subject to a common preference or utility function which generates their comparison behaviors in experiments.
no code implementations • 19 May 2016 • Qianqian Xu, Jiechao Xiong, Xiaochun Cao, Yuan YAO
With the rapid growth of crowdsourcing platforms it has become easy and relatively inexpensive to collect a dataset labeled by multiple annotators in a short time.
no code implementations • 28 Feb 2015 • Braxton Osting, Jiechao Xiong, Qianqian Xu, Yuan YAO
In this setting, a pairwise comparison dataset is typically gathered via random sampling, either \emph{with} or \emph{without} replacement.
no code implementations • 25 Jan 2015 • Yanwei Fu, Timothy M. Hospedales, Tao Xiang, Jiechao Xiong, Shaogang Gong, Yizhou Wang, Yuan YAO
In this paper, we propose a more principled way to identify annotation outliers by formulating the subjective visual property prediction task as a unified robust learning to rank problem, tackling both the outlier detection and learning to rank jointly.
no code implementations • 15 Aug 2014 • Qianqian Xu, Jiechao Xiong, Xiaochun Cao, Qingming Huang, Yuan YAO
In this paper we study the problem of how to estimate such visual properties from a ranking perspective with the help of the annotators from online crowdsourcing platforms.
1 code implementation • 30 Jun 2014 • Stanley Osher, Feng Ruan, Jiechao Xiong, Yuan YAO, Wotao Yin
In this paper, we recover sparse signals from their noisy linear measurements by solving nonlinear differential inclusions, which is based on the notion of inverse scale space (ISS) developed in applied mathematics.